trouble understanding implementation of hash table with chaining

trouble understanding implementation of hash table with chaining - java

I'm studying on hash table with chaining in java by its implementation. The trouble is about get() method. An index value is determined with key.hashCode() % table.length. Assume that the table size is 10 and key.hashCode() is 124 so index is found as 4. In for each loop table[index] is started from table[4], AFAIK index is being incremented one by one 4,5,6,7... so on. But what about indices 0,1,2,3? Are they been checked? (I think no) Isn't there any possibility that occurring of key on one of the indices? (I think yes). The other issue that there are null checks but initially there is no any null assignment for key and value. So how can the checking work? Is null assigned as soon as private LinkedList<Entry<K, V>>[] table is declared?
// Data Structures: Abstraction and Design Using Java, Koffman, Wolfgang
package KW.CH07;
import java.util.AbstractMap;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.StringJoiner;
/**
* Hash table implementation using chaining.
* #param <K> The key type
* #param <V> The value type
* #author Koffman and Wolfgang
**/
public class HashtableChain<K, V>
// Insert solution to programming project 7, chapter -1 here
implements KWHashMap<K, V> {
/** The table */
private LinkedList<Entry<K, V>>[] table;
/** The number of keys */
private int numKeys;
/** The capacity */
private static final int CAPACITY = 101;
/** The maximum load factor */
private static final double LOAD_THRESHOLD = 3.0;
// Note this is equivalent to java.util.AbstractMap.SimpleEntry
/** Contains key-value pairs for a hash table.
#param <K> the key type
#param <V> the value type
*/
public static class Entry<K, V>
// Insert solution to programming project 6, chapter -1 here
{
/** The key */
private final K key;
/** The value */
private V value;
/**
* Creates a new key-value pair.
* #param key The key
* #param value The value
*/
public Entry(K key, V value) {
this.key = key;
this.value = value;
}
/**
* Retrieves the key.
* #return The key
*/
#Override
public K getKey() {
return key;
}
/**
* Retrieves the value.
* #return The value
*/
#Override
public V getValue() {
return value;
}
/**
* Sets the value.
* #param val The new value
* #return The old value
*/
#Override
public V setValue(V val) {
V oldVal = value;
value = val;
return oldVal;
}
// Insert solution to programming exercise 3, section 4, chapter 7 here
}
// Constructor
public HashtableChain() {
table = new LinkedList[CAPACITY];
}
// Constructor for test purposes
HashtableChain(int capacity) {
table = new LinkedList[capacity];
}
/**
* Method get for class HashtableChain.
* #param key The key being sought
* #return The value associated with this key if found;
* otherwise, null
*/
#Override
public V get(Object key) {
int index = key.hashCode() % table.length;
if (index < 0) {
index += table.length;
}
if (table[index] == null) {
return null; // key is not in the table.
}
// Search the list at table[index] to find the key.
for (Entry<K, V> nextItem : table[index]) {
if (nextItem.getKey().equals(key)) {
return nextItem.getValue();
}
}
// assert: key is not in the table.
return null;
}
/**
* Method put for class HashtableChain.
* #post This key-value pair is inserted in the
* table and numKeys is incremented. If the key is already
* in the table, its value is changed to the argument
* value and numKeys is not changed.
* #param key The key of item being inserted
* #param value The value for this key
* #return The old value associated with this key if
* found; otherwise, null
*/
#Override
public V put(K key, V value) {
int index = key.hashCode() % table.length;
if (index < 0) {
index += table.length;
}
if (table[index] == null) {
// Create a new linked list at table[index].
table[index] = new LinkedList<>();
}
// Search the list at table[index] to find the key.
for (Entry<K, V> nextItem : table[index]) {
// If the search is successful, replace the old value.
if (nextItem.getKey().equals(key)) {
// Replace value for this key.
V oldVal = nextItem.getValue();
nextItem.setValue(value);
return oldVal;
}
}
// assert: key is not in the table, add new item.
table[index].addFirst(new Entry<>(key, value));
numKeys++;
if (numKeys > (LOAD_THRESHOLD * table.length)) {
rehash();
}
return null;
}
/** Returns true if empty
#return true if empty
*/
#Override
public boolean isEmpty() {
return numKeys == 0;
}
}

Assume that the table size is 10 and key.hashCode() is 124 so index is found as 4. In for each loop table[index] is started from table[4]
Correct.
there are null checks but initially there is no any null assignment for key and value. So how can the checking work?
When an array of objects is initialized, all values are set to null.
index is being incremented one by one 4,5,6,7... so on. But what about indices 0,1,2,3? Are they been checked? (I think no) Isn't there any possibility that occurring of key on one of the indices? (I think yes).
Looks like there's some misunderstanding here. First, think of the data structure like this (with data having already been added to it):
table:
[0] -> null
[1] -> LinkedList -> item 1 -> item 2 -> item 3
[2] -> LinkedList -> item 1
[3] -> null
[4] -> LinkedList -> item 1 -> item 2
[5] -> LinkedList -> item 1 -> item 2 -> item 3 -> item 4
[6] -> null
Another important point is that the hash code for a given key should not change, so it will always map to the same index in the table.
So say we call get with a value who's hash code maps it to 3, then we know that it's not in the table:
if (table[index] == null) {
return null; // key is not in the table.
}
If another key comes in that maps to 1, now we need to iterate over the LinkedList:
// LinkedList<Entry<K, V>> list = table[index]
for (Entry<K, V> nextItem : table[index]) {
// iterate over item 1, item 2, item 3 until we find one that is equal.
if (nextItem.getKey().equals(key)) {
return nextItem.getValue();
}
}

I think you aren't quite visualizing your hash table correctly. There are two equally good simple implementations of a hash table.
Method 1 uses linked lists: An array (well, Vector, actually) of linked lists.
Given a "key", you derive a hash value for that key(*). You take the remainder of that hash value relative to the current size of the vector, let's call that "x". Then you sequentially search the linked list that vector[x] points to for a match to your key.
(*) You hope that the hash values will be reasonably well-distributed. There are complex algorithms for doing this. Let's hope your JVM implementation of HashCode does a good job of this.
Method 2 avoids linked lists: you create a Vector and compute an index into the Vector (as above). Then you look at the Vector.get(x). If that's the key you want, your return the corresponding value. Let's assume it's not. Then you look at Vector.get(x+1), Vector.get(x+2), etc. Eventually, one of the following three things will happen:
a) You find the key you are looking for. Then you return the corresponding value.
b) you find an empty entry (key == null). Return null or whatever value you have chosen to mean "this isn't the droid you're looking for".
c) you have examined every entry in the Vector. Again, return null or whatever.
Checking for (c) is a precaution, so that if the Hash Table happens to be full you won't loop forever. If the hash table is about to be full (you can keep a count of how many entries have been used) you should reallocate a bigger hash table. IDeally, you want to keep the hash table sparse enough that you never get anywhere near searching the whole table: that vitiates the whole purpose of a hash table -- that you can search it in much less than linear time, ideally in order 1 (that is, the number of comparisons is <= a small constant). I would suggest that you allocate a Vector that is at least 10x the number of entries you expect to put in it.
The use of the word "chaining" in you questions suggests to me that you want to implement the second type of hash table.
Btw, you should never use 10 as the size of a hash table. The size should be a prime number.
Hope this helps.

Related

How could HashMap assurance same index when a duplicate key added with different `tab.length`?

The following piece of code is used to add an element to a HashMap (from Android 5.1.1 source tree), I'm very confused this statement:int index = hash & (tab.length - 1);, how could this map assurance the same index when a duplicate key added with different tab.length?
For example, assume that we have a new empty HashMap hMap. Firstly, we add pair ("1","1") to it, assume tab.length equals 1 at this time, then we add many pairs to this map, assume tab.length equals "x", now we add a duplicate pair ("1","1") to it, notice that the tab.length is changed, so the index's value int index = hash & (tab.length - 1); may also changed.
/**
* Maps the specified key to the specified value.
*
* #param key
* the key.
* #param value
* the value.
* #return the value of any previous mapping with the specified key or
* {#code null} if there was no such mapping.
*/
#Override public V put(K key, V value) {
if (key == null) {
return putValueForNullKey(value);
}
int hash = Collections.secondaryHash(key);
HashMapEntry<K, V>[] tab = table;
int index = hash & (tab.length - 1);
for (HashMapEntry<K, V> e = tab[index]; e != null; e = e.next) {
if (e.hash == hash && key.equals(e.key)) {
preModify(e);
V oldValue = e.value;
e.value = value;
return oldValue;
}
}
// No entry for (non-null) key is present; create one
modCount++;
if (size++ > threshold) {
tab = doubleCapacity();
index = hash & (tab.length - 1);
}
addNewEntry(key, value, hash, index);
return null;
}

When table need to reconstruct, it will first re-computing the index of older element, so the index will follow the changes of table's length.

Two-dimensional Map Java

I need a data structure that I could store my information in a two dimensional way. For example imagine a table that contains user-item ratings. I need to store all ratings for all users. let's say, user u1. I need to store ratings for user u1 and u2 and u3 and all other users. But the problem is I also need too store all ratings for all items. For example I need to store ratings provided by all users for each item. So I need something like a map that for users the key is the user ID and the value is the set of ratings. I can do that easily. But my problem is how I can also store ratings for Items. for example a map that the key is the item ID and the value is the set of ratings provided be users for that Item. I wanted to upload a table but since I didn't have enough reputation I couldn't do that.So just imagine a table like a two dimensional matrix that rows are users and columns are items. Is there a data structure that can do that? Or I should build two different maps? maybe there is a better option than Map but Since I had to choose a title for my question I wrote map.
Thanks

You can use the Table class from the free Guava library
Table<Integer, String, Double> table = HashBasedTable.create();
table.put(1, "a", 2.0);
double v = table.get(1, "a"); // getting 2.0

Here is my own version of an appropriate Table object. Mind you, using something provided by an existing library is good. But trying your own implementation will help you understand the issues involved better. So you can try to add "remove" methods etc. to my implementation to complete it.
I prefer keeping the data in a table rather than implementing the maps inside User and Item, because the table can enforce adding each new rating through both row and column. If you keep your maps separate in two independent objects, you won't be able to enforce this.
Note that while I return protective copies of the internal maps in getCol and getRow, I return the reference to the actual values, not copies thereof, so that you can change a user's rating (assuming you chose a mutable object for that) without changing the table structure. Also note that if your user and item objects are mutable and this affects their equals or hashCode, the table will behave unpredictably.
public class Table<K1, K2, V> {
// Two maps allowing us to retrieve the value through the row or the
// column key.
private Map<K1, Map<K2, V>> rowMap;
private Map<K2, Map<K1, V>> colMap;
public Table() {
rowMap = new HashMap<>();
colMap = new HashMap<>();
}
/**
* Allows us to create a key for the row, and place it in the structure
* while there are still no relations for it.
*
* #param key
* The key for which a new empty row will be created.
*/
public void addEmptyRow(K1 key) {
if (!rowMap.containsKey(key)) {
rowMap.put(key, new HashMap<K2, V>());
}
}
/**
* Allows us to create a key for the column, and place it in the
* structure while there are still no relations for it.
*
* #param key
* The key for which a new empty column will be created.
*/
public void addEmptyCol(K2 key) {
if (!colMap.containsKey(key)) {
colMap.put(key, new HashMap<K1, V>());
}
}
/**
* Insert a value into the table using the two keys.
*
* #param rowKey
* Row key to access this value
* #param colKey
* Column key to access this value
* #param value
* The value to be associated with the above two keys.
*/
public void put(K1 rowKey, K2 colKey, V value) {
Map<K2, V> row;
Map<K1, V> col;
// Find the internal row. If there is no entry, create one.
if (rowMap.containsKey(rowKey)) {
row = rowMap.get(rowKey);
} else {
row = new HashMap<K2, V>();
rowMap.put(rowKey, row);
}
// Find the internal column, If there is no entry, create one.
if (colMap.containsKey(colKey)) {
col = colMap.get(colKey);
} else {
col = new HashMap<K1, V>();
colMap.put(colKey, col);
}
// Add the value to both row and column.
row.put(colKey, value);
col.put(rowKey, value);
}
/**
* Get the value associated with the given row and column.
*
* #param rowKey
* Row key to access the value
* #param colKey
* Column key to access the value
* #return Value in the given row and column. Null if mapping doesn't
* exist
*/
public V get(K1 rowKey, K2 colKey) {
Map<K2, V> row;
row = rowMap.get(rowKey);
if (row != null) {
return row.get(colKey);
}
return null;
}
/**
* Get a map representing the row for the given key. The map contains
* only column keys that actually have values in this row.
*
* #param rowKey
* The key to the row in the table
* #return Map representing the row. Null if there is no row with the
* given key.
*/
public Map<K2, V> getRow(K1 rowKey) {
// Note that we are returning a protective copy of the row. The user
// cannot change the internal structure of the table, but is allowed
// to change the value's state if it is mutable.
if (rowMap.containsKey(rowKey)) {
return new HashMap<>(rowMap.get(rowKey));
}
return null;
}
/**
* Get a map representing the column for the given key. The map contains
* only row keys that actually have values in this column.
*
* #param colKey
* The key to the column in the table.
* #return Map representing the column. Null if there is no column with
* the given key.
*/
public Map<K1, V> getCol(K2 colKey) {
// Note that we are returning a protective copy of the column. The
// user cannot change the internal structure.
if (colMap.containsKey(colKey)) {
return new HashMap<>(colMap.get(colKey));
}
return null;
}
/**
* Get a set of all the existing row keys.
*
* #return A Set containing all the row keys. The set may be empty.
*/
public Set<K1> getRowKeys() {
return new HashSet(rowMap.keySet());
}
/**
* Get a set of all the existing column keys.
*
* #return A set containing all the column keys. The set may be empty.
*/
public Set<K2> getColKeys() {
return new HashSet(colMap.keySet());
}
}
The reason that I have methods addEmptyRow and addEmptyCol is that I thought it may be redundant to keep a separate data structure for your users and items. Once you add them to the table like this, you can get them through the getRowKeys and getColKeys so there is no need to keep them separately unless you want to structure them in anything other than a Set.
Note that this Table works with the key's value - two keys which are equals equivalent are the same key, and you should design your User and Item objects accordingly.
With appropriate definitions of User, Item and Rating, you can do something like
Table<User, Item, Rating> table = new Table<>();
table.addEmptyCol(new Item("Television"));
table.addEmptyCol(new Item("Sofa"));
User user = new User("Anakin");
Item item = new Item("Light Sabre");
table.put(user, item, new Rating(5));
Item item1 = new Item("Helmet");
table.put(user, item1, new Rating(7));
Rating rating = table.get(user, item);
rating.setRating(rating.getRating() + 10);
User user1 = new User("Obi-Wan");
table.put(user1, item, new Rating(8));
table.put(user1, new Item("Television"), new Rating(0));
And then query the table for a particular user like so:
Map<Item, Rating> anakinsRatings = table.getRow(user);
for (Map.Entry<Item, Rating> entry : anakinsRatings.entrySet()) {
System.out.println(user + " rated " + entry.getKey() + " as "
+ entry.getValue().getRating());
}
Or display a list of ratings for all items like so:
for (Item currItem : table.getColKeys()) {
Map<User, Rating> itemMap = table.getCol(currItem);
if (itemMap.isEmpty()) {
System.out.println("There are currently no ratings for \""
+ currItem
+ "\"");
} else {
for (Map.Entry<User, Rating> entry : table.getCol(currItem).entrySet()) {
System.out.println("\""
+ currItem
+ "\" has been rated "
+ entry.getValue().getRating() + " by "
+ entry.getKey());
}
}
}
As I said, the implementation is not complete - there is no toString for the table, for example, no remove, removeRow, removeCol, clear, etc.

So, you have two one-many relationships. A user has (gives) many ratings and an item has many ratings; this gives a many-many relationship of users-items which is your problem. Why not simply model it as described:
public class Rating {
private User ratedBy;
private Item itemRated;
public Item getItem() { return itemRated; }
}
public class User {
private Set<Rating> allRatings = new HashSet<>();
public Rating getRatingFor(Item item) {
for(Rating rating: allRatings) {
if(item.equals(rating.getItem()) {
return rating;
}
}
return null;
}
}
public class Item {
private Set<Rating> allRatings = new HashSet<>();
}
... and getters/setters etc as required. You can then get ratings with:
User user1 = new User();
// ... do stuff to populate ratings
Rating itemRatingByUser = user1.getRatingFor(item);

A quick and dirty solution is to use a two dimensional key.
Assuming the id for both user and item is of the same type, you can create a class that is just a holder for the id and the type of the key. (the type can be the class of User or Item if available or just an enum value). Make then the map have keys of this type. Of course each rating will be referenced at least twice (once for each type)

The data structure you need is called a Table. A table has two keys and an object, and looks, more or less, like an excel table with the two key sets as columns and rows. Hence the name. There are a variety of Table implementations. I think its normal to use the Guava implementations now. An explanation of guava's table interface is here.
The API for guava's table is here, and the implementation that you want, the HashBasedTable, is here

JAVA HashMap 2D, cant get the right approach to make a 2D HashMap, i mean a HashMap into another HashMap

I want to make a board of Students' names and Subjects and each student has a grade in each subject (or not.. he can leave the exam and doesnt write it, and then his case will be empty). I want to use just HashMaps. I mean, it will be something like that:
HashMap<String,HashMap<String,String>> bigBoard =
new HashMap<String,HashMap<String,String>>();
but I think, I dont have the right idea, because for each subject, there will be many grades (values) so that won't be possible. Do I have to make a map for each student? with his subject? but then the table on output won't be arranged. Do you have a proposition?
I would like a table that look like something like that for example.
Column-Key →
Rowkey↓ Mathematics Physics Finance
Daniel Dolter 1.3 3.7
Micky Mouse 5
Minnie Mouse 1.7 n/a
Dagobert Duck 4.0 1.0
(I would use all the keys/values as Strings, it will be more simple like that.)
After the implementation of our class (for example class-name is String2D), we should use it like that.
public static void main(String[] args) {
String2D map2D = new String2D();
map2D.put("Daniel Doster", "Practical Mathematics", "1.3");
map2D.put("Daniel Doster", "IT Systeme", "3.7");
map2D.put("Micky Mouse", "Finance", "5");
map2D.put("Minnie Mouse", "IT Systeme", "1.7");
map2D.put("Minnie Mouse", "Finance", "n/a");
map2D.put("Dagobert Duck", "Practical Mathematics", "4.0");
map2D.put("Dagobert Duck", "Finance", "1.0");
System.out.println(map2D);
}
No "HashMap" will be seen.. and Arrays aren't allowed

You can use this class:
public class BiHashMap<K1, K2, V> {
private final Map<K1, Map<K2, V>> mMap;
public BiHashMap() {
mMap = new HashMap<K1, Map<K2, V>>();
}
/**
* Associates the specified value with the specified keys in this map (optional operation). If the map previously
* contained a mapping for the key, the old value is replaced by the specified value.
*
* #param key1
* the first key
* #param key2
* the second key
* #param value
* the value to be set
* #return the value previously associated with (key1,key2), or <code>null</code> if none
* #see Map#put(Object, Object)
*/
public V put(K1 key1, K2 key2, V value) {
Map<K2, V> map;
if (mMap.containsKey(key1)) {
map = mMap.get(key1);
} else {
map = new HashMap<K2, V>();
mMap.put(key1, map);
}
return map.put(key2, value);
}
/**
* Returns the value to which the specified key is mapped, or <code>null</code> if this map contains no mapping for
* the key.
*
* #param key1
* the first key whose associated value is to be returned
* #param key2
* the second key whose associated value is to be returned
* #return the value to which the specified key is mapped, or <code>null</code> if this map contains no mapping for
* the key
* #see Map#get(Object)
*/
public V get(K1 key1, K2 key2) {
if (mMap.containsKey(key1)) {
return mMap.get(key1).get(key2);
} else {
return null;
}
}
/**
* Returns <code>true</code> if this map contains a mapping for the specified key
*
* #param key1
* the first key whose presence in this map is to be tested
* #param key2
* the second key whose presence in this map is to be tested
* #return Returns true if this map contains a mapping for the specified key
* #see Map#containsKey(Object)
*/
public boolean containsKeys(K1 key1, K2 key2) {
return mMap.containsKey(key1) && mMap.get(key1).containsKey(key2);
}
public void clear() {
mMap.clear();
}
}
And then create use it like this:
BiHashMap<String,String,String> bigBoard = new BiHashMap<String,String,String>();
However for performance you may want to store the different grades in an array (assuming that you have a fix set of courses)

I don't think a nested hashmap is the way to go. Create a Student class and Subject class.
public class Student{
private ArrayList<Subject> SubjectList = new ArrayList<Subject>();
private String name;
public Student(String name){
this.name=name;
}
public void addSubject(Subject s){
SubjectList.add(s);
}
public String getName(){
return this.name;
}
//...add methods for other operations
}
public class Subject{
private ArrayList<double > GradeList = new ArrayList<double>();
private String name;
public Subject(String name){
this.name=name;
}
public void addGrade(double s){
GradeList.add(s);
}
//...add methods for other operations
}
Then you can store the Students instances in a hashmap.
public static void main(String[] args){
HashMap<Students> hm = new HashMap<Students>();
Student s = new Student("Daniel Dolter");
Subject sub = new Subject("Mathematics");
sub.addGrades(1.3);
s.addSubject(sub);
hm.put(s.getName(),s);
}

With Java 8 it is possible to use computeIfAbsent to insert a default value if it is empty.
So you can simply use this as the type of the 2d-map:
Map<RowType, Map<ColumnType, ValueType>> map = new WhateverMap<>();
let's say all types are int:
int get(int x, int y)
return map.computeIfAbsent(x, (key)->new WhateverMap<>()).computeIfAbsent(y,(key)->0);
}
void put(int x, int y, int value)
return map.computeIfAbsent(x, (key)->new WhateverMap<>()).put(y,value);
}
Note that is not atomic. therefore this is not thread-safe even if WhateverMap is.

You can use Google Guava's Table<R, C, V> collection. It is similar to eabraham's answer. A value V is keyed by a row R and a column C. It is a better alternative to using HashMap<R, HashMap<C, V>> which becomes quickly unreadable and difficult to work with.
See their GitHub Wiki for more information.

Memory efficient multivaluemap

Hi I have the following problem:
I'm storing strings and a corresponding list of integer values in an MultiValueMap<String, Integer>
I'm storing about 13 000 000 million strings and one string can have up to 500 or more values.
For every single value i will have random access on the Map. So worst case are 13 000 000* 500 put calls. Now the speed of the map is good but the memory overhead gets quite high. A MultiValueMap<String, Integer> is nothing else then a HashMap/TreeMap<String, <ArrayList<Integer>>. Both HashMap and TreeMap have quite a lot of memory Overhead. I wont be modifying the map once it is done, but I need it to be fast and as small as possible for random access in a program. (I'm storing it on disk and loading it on start, the serialized map file takes up about 600mb but in memory its about 3gb?)
the most memory efficient thing would be, to store the String in sorted string array and have a corresponding two dimensional int array for values. So access would be a binary search on the string array and getting the corresponding values.
Now I have three ways to get there:
I use a sorted MultivalueMap (TreeMap) for the creation phase to store everything.After I'm finished with getting all values, I get the string array by calling map.keyset().toArray(new String[0]); Make a two dimensional int array and get all the values from the multivaluemap.
Pro: It's easy to implement, It is still fast during creation.
Con: It takes up even more memory during the copying from Map to Arrays.
I use Arrays or maybe ArrayLists from the start and store everything in there
Pro: least memory overhead.
Con: this would be enormously slow because i would have to sort/copy the Array every time a add a new Key, Also i would need to implement my own (propably even slower) sorting to keep the corresponding int array in the same order like the strings. Hard to implement
I use Arrays and a MultivalueMap as buffer. After the program finished 10% or 20% of the creation phase, I will add the values to the Arrays and keep them in order, then start a new Map.
Pro: Propably still fast enough and memory efficient enough.
Con: Hard to implement.
None of these solutions really feel right to me. Do you know any other solutions to this problem, maybe a memory efficient (MultiValue)Map implementation?
I know I could be using a database so don't bother posting it as an answer. I want to know how i could do this without using a database.

If you switched to Guava's Multimap -- I have no idea if that's possible for your application -- you might be able to use Trove and get
ListMultimap<String, Integer> multimap = Multimaps.newListMultimap(
new HashMap<String, Collection<Integer>>(),
new Supplier<List<Integer>>() {
public List<Integer> get() {
return new TIntListDecorator();
}
});
which will make a ListMultimap that uses a HashMap to map to List values backed by int[] arrays, which should be memory-efficient, though you'll pay a small speed penalty because of boxing. You might be able to do something similar for MultiValueMap, though I have no idea what library that's from.

You can use compressed strings to reduce drastically the memory usage.
Parameters to configure your JVM
Comparison of its usage between various java versions
Furthermore, there are other more drastic solutions (it would require some reimplementation):
Memory-disk based list implementation or suggestions about NoSQL database.

Depending on which Integer values you store in your map, a large amount of your heap memory overhead may be caused by having distinct Integer instances, which take up much more RAM than a primitive int value.
Consider using a Map from String to one of the many IntArrayList implementations floating around (e.g. in Colt or in Primitive Collections for Java), which basically implement a List backed by an int array, instead of a being backed by an array of Integer instances.

First, consider the memory taken by the integers. You said that the range will be about 0-4000000. 24 bits is enough to represent 16777216 distinct values. If that is acceptable, you could use byte arrays for the integers, with 3 bytes per integer, and save 25%. You would have to index into the array something like this:
int getPackedInt(byte[] array, int index) {
int i = index*3;
return ((array[i] & 0xFF)<<16) + ((array[i+1] & 0xFF) <<8) + (array[i+2] & 0xFF);
}
int storePackedInt(byte[] array, int index, int value) {
assert value >= 0 && value <= 0xFFFFFF;
int i = index*3;
array[i] = (byte)((value>>16) & 0xFF);
array[i+1] = (byte)((value>>8) & 0xFF);
array[i+2] = (byte)(value & 0xFF);
}
Can you say anything about the distribution of the integers? If many of them will fit in 16 bits, you could use an encoding with a variable number of bytes per number (something like UTF-8 does for representing characters).
Next, consider whether you can save memory on the Strings. What are the characteristics of the Strings? How long will they typically be? Will many strings share prefixes? A compression scheme tailored to the characteristics of your application could save a lot of space (as falsarella pointed out). OR, if many strings will share prefixes, storing them in some type of search trie could be more efficient. (There is a type of trie called "patricia" which might be suitable for this application.) As a bonus, note that searching for Strings in a trie can be faster than searching a hash map (though you'd have to benchmark to see if that is true in your application).
Will the Strings all be ASCII? If so, 50% of the memory used for Strings will be wasted, as a Java char is 16 bits. Again, in this case, you could consider using byte arrays.
If you only need to look Strings up, not iterate over the stored Strings, you could also consider something rather unconventional: hash the Strings, and keep only the hash. Since different String can hash to the same value, there is a chance that a String which was never stored, may still be "found" by a search. But if you use enough bits for the hash value (and a good hash function), you can make that chance so infinitesimally small that it will almost certainly never happen in the estimated lifespan of the universe.
Finally, there is the memory for the structure itself, which holds the Strings and integers. I already suggested using a trie, but if you decide not to do that, nothing will use less memory than parallel arrays -- one sorted array of Strings (which you can do binary search on, as you said), and a parallel array of arrays of integers. After you do a binary search to find an index into the String array, you can use the same index to access the array-of-integer array.
While you are building the structure, if you do decide that a search trie is a good choice, I would just use that directly. Otherwise, you could do 2 passes: one to build up a set of strings (then put them into an array and sort them), and a second pass to add the arrays of integers.

If there are patterns to your key strings, especially common roots, then a a Trie could be an effective method of storing significantly less data.
Here's the code for a working TrieMap.
NB: The usual advice on using EntrySet to iterate across Maps does not apply to Tries. They are exceptionally inefficient in a Trie so please avoid requesting one if at all possible.
/**
* Implementation of a Trie structure.
*
* A Trie is a compact form of tree that takes advantage of common prefixes
* to the keys.
*
* A normal HashSet will take the key and compute a hash from it, this hash will
* be used to locate the value through various methods but usually some kind
* of bucket system is used. The memory footprint resulting becomes something
* like O(n).
*
* A Trie structure essentuially combines all common prefixes into a single key.
* For example, holding the strings A, AB, ABC and ABCD will only take enough
* space to record the presence of ABCD. The presence of the others will be
* recorded as flags within the record of ABCD structure at zero cost.
*
* This structure is useful for holding similar strings such as product IDs or
* credit card numbers.
*
*/
public class TrieMap<V> extends AbstractMap<String, V> implements Map<String, V> {
/**
* Map each character to a sub-trie.
*
* Could replace this with a 256 entry array of Tries but this will handle
* multibyte character sets and I can discard empty maps.
*
* Maintained at null until needed (for better memory footprint).
*
*/
protected Map<Character, TrieMap<V>> children = null;
/**
* Here we store the map contents.
*/
protected V leaf = null;
/**
* Set the leaf value to a new setting and return the old one.
*
* #param newValue
* #return old value of leaf.
*/
protected V setLeaf(V newValue) {
V old = leaf;
leaf = newValue;
return old;
}
/**
* I've always wanted to name a method something like this.
*/
protected void makeChildren () {
if ( children == null ) {
// Use a TreeMap to ensure sorted iteration.
children = new TreeMap<Character, TrieMap<V>>();
}
}
/**
* Finds the TrieMap that "should" contain the key.
*
* #param key
*
* The key to find.
*
* #param grow
*
* Set to true to grow the Trie to fit the key.
*
* #return
*
* The sub Trie that "should" contain the key or null if key was not found and
* grow was false.
*/
protected TrieMap<V> find(String key, boolean grow) {
if (key.length() == 0) {
// Found it!
return this;
} else {
// Not at end of string.
if (grow) {
// Grow the tree.
makeChildren();
}
if (children != null) {
// Ask the kids.
char ch = key.charAt(0);
TrieMap<V> child = children.get(ch);
if (child == null && grow) {
// Make the child.
child = new TrieMap<V>();
// Store the child.
children.put(ch, child);
}
if (child != null) {
// Find it in the child.
return child.find(tail(key), grow);
}
}
}
return null;
}
/**
* Remove the head (first character) from the string.
*
* #param s
*
* The string.
*
* #return
*
* The same string without the first (head) character.
*
*/
// Suppress warnings over taking a subsequence
private String tail(String s) {
return s.substring(1, s.length());
}
/**
*
* Add a new value to the map.
*
* Time footprint = O(s.length).
*
* #param s
*
* The key defining the place to add.
*
* #param value
*
* The value to add there.
*
* #return
*
* The value that was there, or null if it wasn't.
*
*/
#Override
public V put(String key, V value) {
V old = null;
// If empty string.
if (key.length() == 0) {
old = setLeaf(value);
} else {
// Find it.
old = find(key, true).put("", value);
}
return old;
}
/**
* Gets the value at the specified key position.
*
* #param o
*
* The key to the location.
*
* #return
*
* The value at that location, or null if there is no value at that location.
*/
#Override
public V get(Object o) {
V got = null;
if (o != null) {
String key = (String) o;
TrieMap<V> it = find(key, false);
if (it != null) {
got = it.leaf;
}
} else {
throw new NullPointerException("Nulls not allowed.");
}
return got;
}
/**
* Remove the value at the specified location.
*
* #param o
*
* The key to the location.
*
* #return
*
* The value that was removed, or null if there was no value at that location.
*/
#Override
public V remove(Object o) {
V old = null;
if (o != null) {
String key = (String) o;
if (key.length() == 0) {
// Its me!
old = leaf;
leaf = null;
} else {
TrieMap<V> it = find(key, false);
if (it != null) {
old = it.remove("");
}
}
} else {
throw new NullPointerException("Nulls not allowed.");
}
return old;
}
/**
* Count the number of values in the structure.
*
* #return
*
* The number of values in the structure.
*/
#Override
public int size() {
// If I am a leaf then size increases by 1.
int size = leaf != null ? 1 : 0;
if (children != null) {
// Add sizes of all my children.
for (Character c : children.keySet()) {
size += children.get(c).size();
}
}
return size;
}
/**
* Is the tree empty?
*
* #return
*
* true if the tree is empty.
* false if there is still at least one value in the tree.
*/
#Override
public boolean isEmpty() {
// I am empty if I am not a leaf and I have no children
// (slightly quicker than the AbstaractCollection implementation).
return leaf == null && (children == null || children.isEmpty());
}
/**
* Returns all keys as a Set.
*
* #return
*
* A HashSet of all keys.
*
* Note: Although it returns Set<S> it is actually a Set<String> that has been
* home-grown because the original keys are not stored in the structure
* anywhere.
*/
#Override
public Set<String> keySet() {
// Roll them a temporary list and give them a Set from it.
return new HashSet<String>(keyList());
}
/**
* List all my keys.
*
* #return
*
* An ArrayList of all keys in the tree.
*
* Note: Although it returns List<S> it is actually a List<String> that has been
* home-grown because the original keys are not stored in the structure
* anywhere.
*
*/
protected List<String> keyList() {
List<String> contents = new ArrayList<String>();
if (leaf != null) {
// If I am a leaf, a null string is in the set.
contents.add((String) "");
}
// Add all sub-tries.
if (children != null) {
for (Character c : children.keySet()) {
TrieMap<V> child = children.get(c);
List<String> childContents = child.keyList();
for (String subString : childContents) {
// All possible substrings can be prepended with this character.
contents.add((String) (c + subString.toString()));
}
}
}
return contents;
}
/**
* Does the map contain the specified key.
*
* #param key
*
* The key to look for.
*
* #return
*
* true if the key is in the Map.
* false if not.
*/
public boolean containsKey(String key) {
TrieMap<V> it = find(key, false);
if (it != null) {
return it.leaf != null;
}
return false;
}
/**
* Represent me as a list.
*
* #return
*
* A String representation of the tree.
*/
#Override
public String toString() {
List<String> list = keyList();
//Collections.sort((List<String>)list);
StringBuilder sb = new StringBuilder();
Separator comma = new Separator(",");
sb.append("{");
for (String s : list) {
sb.append(comma.sep()).append(s).append("=").append(get(s));
}
sb.append("}");
return sb.toString();
}
/**
* Clear down completely.
*/
#Override
public void clear() {
children = null;
leaf = null;
}
/**
* Return a list of key/value pairs.
*
* #return
*
* The entry set.
*/
public Set<Map.Entry<String, V>> entrySet() {
Set<Map.Entry<String, V>> entries = new HashSet<Map.Entry<String, V>>();
List<String> keys = keyList();
for (String key : keys) {
entries.add(new Entry<String,V>(key, get(key)));
}
return entries;
}
/**
* An entry.
*
* #param <S>
*
* The type of the key.
*
* #param <V>
*
* The type of the value.
*/
private static class Entry<S, V> implements Map.Entry<S, V> {
protected S key;
protected V value;
public Entry(S key, V value) {
this.key = key;
this.value = value;
}
public S getKey() {
return key;
}
public V getValue() {
return value;
}
public V setValue(V newValue) {
V oldValue = value;
value = newValue;
return oldValue;
}
#Override
public boolean equals(Object o) {
if (!(o instanceof TrieMap.Entry)) {
return false;
}
Entry e = (Entry) o;
return (key == null ? e.getKey() == null : key.equals(e.getKey()))
&& (value == null ? e.getValue() == null : value.equals(e.getValue()));
}
#Override
public int hashCode() {
int keyHash = (key == null ? 0 : key.hashCode());
int valueHash = (value == null ? 0 : value.hashCode());
return keyHash ^ valueHash;
}
#Override
public String toString() {
return key + "=" + value;
}
}
}

Java: Is there a container which effectively combines HashMap and ArrayList?

I keep finding a need for a container which is both a HashMap (for fast lookup on a key type) and an ArrayList (for fast access by integer index).
LinkedHashMap is almost right, in that it keeps an iterable list, but it is unfortunately a linked list... retrieving the Nth element requires iterating from 1 to N.
Is there a container type which fits this bill and which I've somehow missed? What do other people do when they need to access the same set of data by key and by index?

Take a look at Apache Commons LinkedMap.

If you are removing (in the middle) as well as accessing by index and by key (which means that the indexes are changing), you are possible out of look - I think there simply can't be an implementation which provides O(1) for both of remove (by index, key or iterator) and get(index). This is why we have both LinkedList (with iterator.remove() or remove(0) in O(1)) and ArrayList (with get(index) in O(1)) in the standard API.
You could have both removing and index-getting in O(log n) if you use a tree structure instead of array or linked list (which could be combined with a O(1) key based read access - getting the index for your key-value-pair would still need O(log n), though).
If you don't want to remove anything, or can live with following indexed not shifted (i.e. remove(i) being equivalent to set(i, null), there is nothing which forbids having both O(1) index and key access - in fact, then the index is simply a second key here, so you could simply use a HashMap and a ArrayList (or two HashMaps) then, with a thin wrapper combining both.
Edit: So, here is an implementation of ArrayHashMap like described in the last paragraph above (using the "expensive remove" variant). It implements the interface IndexedMap below. (If you don't want to copy+paste here, both are also in my github account which will be updated in case of later changes).
package de.fencing_game.paul.examples;
import java.util.*;
/**
* A combination of ArrayList and HashMap which allows O(1) for read and
* modifiying access by index and by key.
* <p>
* Removal (either by key or by index) is O(n), though,
* as is indexed addition of a new Entry somewhere else than the end.
* (Adding at the end is in amortized O(1).)
* </p>
* <p>
* (The O(1) complexity for key based operations is under the condition
* "if the hashCode() method of the keys has a suitable distribution and
* takes constant time", as for any hash-based data structure.)
* </p>
* <p>
* This map allows null keys and values, but clients should think about
* avoiding using these, since some methods return null to show
* "no such mapping".
* </p>
* <p>
* This class is not thread-safe (like ArrayList and HashMap themselves).
* </p>
* <p>
* This class is inspired by the question
* Is there a container which effectively combines HashMap and ArrayList? on Stackoverflow.
* </p>
* #author Paŭlo Ebermann
*/
public class ArrayHashMap<K,V>
extends AbstractMap<K,V>
implements IndexedMap<K,V>
{
/**
* Our backing map.
*/
private Map<K, SimpleEntry<K,V>> baseMap;
/**
* our backing list.
*/
private List<SimpleEntry<K,V>> entries;
/**
* creates a new ArrayHashMap with default parameters.
* (TODO: add more constructors which allow tuning.)
*/
public ArrayHashMap() {
this.baseMap = new HashMap<K,SimpleEntry<K,V>>();
this.entries = new ArrayList<SimpleEntry<K,V>>();
}
/**
* puts a new key-value mapping, or changes an existing one.
*
* If new, the mapping gets an index at the end (i.e. {#link #size()}
* before it gets increased).
*
* This method runs in O(1) time for changing an existing value,
* amortized O(1) time for adding a new value.
*
* #return the old value, if such, else null.
*/
public V put(K key, V value) {
SimpleEntry<K,V> entry = baseMap.get(key);
if(entry == null) {
entry = new SimpleEntry<K,V>(key, value);
baseMap.put(key, entry);
entries.add(entry);
return null;
}
return entry.setValue(value);
}
/**
* retrieves the value for a key.
*
* This method runs in O(1) time.
*
* #return null if there is no such mapping,
* else the value for the key.
*/
public V get(Object key) {
SimpleEntry<K,V> entry = baseMap.get(key);
return entry == null ? null : entry.getValue();
}
/**
* returns true if the given key is in the map.
*
* This method runs in O(1) time.
*
*/
public boolean containsKey(Object key) {
return baseMap.containsKey(key);
}
/**
* removes a key from the map.
*
* This method runs in O(n) time, n being the size of this map.
*
* #return the old value, if any.
*/
public V remove(Object key) {
SimpleEntry<K,V> entry = baseMap.remove(key);
if(entry == null) {
return null;
}
entries.remove(entry);
return entry.getValue();
}
/**
* returns a key by index.
*
* This method runs in O(1) time.
*
*/
public K getKey(int index) {
return entries.get(index).getKey();
}
/**
* returns a value by index.
*
* This method runs in O(1) time.
*
*/
public V getValue(int index) {
return entries.get(index).getValue();
}
/**
* Returns a set view of the keys of this map.
*
* This set view is ordered by the indexes.
*
* It supports removal by key or iterator in O(n) time.
* Containment check runs in O(1).
*/
public Set<K> keySet() {
return new AbstractSet<K>() {
public void clear() {
entryList().clear();
}
public int size() {
return entries.size();
}
public Iterator<K> iterator() {
return keyList().iterator();
}
public boolean remove(Object key) {
return keyList().remove(key);
}
public boolean contains(Object key) {
return keyList().contains(key);
}
};
} // keySet()
/**
* Returns a set view of the entries of this map.
*
* This set view is ordered by the indexes.
*
* It supports removal by entry or iterator in O(n) time.
*
* It supports adding new entries at the end, if the key
* is not already used in this map, in amortized O(1) time.
*
* Containment check runs in O(1).
*/
public Set<Map.Entry<K,V>> entrySet() {
return new AbstractSet<Map.Entry<K,V>>() {
public void clear() {
entryList().clear();
}
public int size() {
return entries.size();
}
public Iterator<Map.Entry<K,V>> iterator() {
return entryList().iterator();
}
public boolean add(Map.Entry<K,V> e) {
return entryList().add(e);
}
public boolean contains(Object o) {
return entryList().contains(o);
}
public boolean remove(Object o) {
return entryList().remove(o);
}
};
} // entrySet()
/**
* Returns a list view of the entries of this map.
*
* This list view is ordered by the indexes.
*
* It supports removal by entry, iterator or sublist.clear in O(n) time.
* (n being the length of the total list, not the sublist).
*
* It supports adding new entries at the end, if the key
* is not already used in this map, in amortized O(1) time.
*
* Containment check runs in O(1).
*/
public List<Map.Entry<K,V>> entryList() {
return new AbstractList<Map.Entry<K,V>>() {
public void clear() {
baseMap.clear();
entries.clear();
}
public Map.Entry<K,V> get(int index) {
return entries.get(index);
}
public int size() {
return entries.size();
}
public Map.Entry<K,V> remove(int index) {
Map.Entry<K,V> e = entries.remove(index);
baseMap.remove(e.getKey());
return e;
}
public void add(int index, Map.Entry<K,V> newEntry) {
K key = newEntry.getKey();
SimpleEntry<K,V> clone = new SimpleEntry<K,V>(newEntry);
if(baseMap.containsKey(key)) {
throw new IllegalArgumentException("duplicate key " +
key);
}
entries.add(index, clone);
baseMap.put(key, clone);
}
public boolean contains(Object o) {
if(o instanceof Map.Entry) {
SimpleEntry<K,V> inMap =
baseMap.get(((Map.Entry<?,?>)o).getKey());
return inMap != null &&
inMap.equals(o);
}
return false;
}
public boolean remove(Object o) {
if (!(o instanceof Map.Entry)) {
Map.Entry<?,?> e = (Map.Entry<?,?>)o;
SimpleEntry<K,V> inMap = baseMap.get(e.getKey());
if(inMap != null && inMap.equals(e)) {
entries.remove(inMap);
baseMap.remove(inMap.getKey());
return true;
}
}
return false;
}
protected void removeRange(int fromIndex, int toIndex) {
List<SimpleEntry<K,V>> subList =
entries.subList(fromIndex, toIndex);
for(SimpleEntry<K,V> entry : subList){
baseMap.remove(entry.getKey());
}
subList.clear();
}
};
} // entryList()
/**
* Returns a List view of the keys in this map.
*
* It allows index read access and key containment check in O(1).
* Changing a key is not allowed.
*
* Removal by key, index, iterator or sublist.clear runs in O(n) time
* (this removes the corresponding values, too).
*/
public List<K> keyList() {
return new AbstractList<K>() {
public void clear() {
entryList().clear();
}
public K get(int index) {
return entries.get(index).getKey();
}
public int size() {
return entries.size();
}
public K remove(int index) {
Map.Entry<K,V> e = entries.remove(index);
baseMap.remove(e.getKey());
return e.getKey();
}
public boolean remove(Object key) {
SimpleEntry<K,V> entry = baseMap.remove(key);
if(entry == null) {
return false;
}
entries.remove(entry);
return true;
}
public boolean contains(Object key) {
return baseMap.containsKey(key);
}
protected void removeRange(int fromIndex, int toIndex) {
entryList().subList(fromIndex, toIndex).clear();
}
};
} // keyList()
/**
* Returns a List view of the values in this map.
*
* It allows get and set by index in O(1) time (set changes the mapping).
*
* Removal by value, index, iterator or sublist.clear is possible
* in O(n) time, this removes the corresponding keys too (only the first
* key with this value for remove(value)).
*
* Containment check needs an iteration, thus O(n) time.
*/
public List<V> values() {
return new AbstractList<V>() {
public int size() {
return entries.size();
}
public void clear() {
entryList().clear();
}
public V get(int index) {
return entries.get(index).getValue();
}
public V set(int index, V newValue) {
Map.Entry<K,V> e = entries.get(index);
return e.setValue(newValue);
}
public V remove(int index) {
Map.Entry<K,V> e = entries.remove(index);
baseMap.remove(e.getKey());
return e.getValue();
}
protected void removeRange(int fromIndex, int toIndex) {
entryList().subList(fromIndex, toIndex).clear();
}
};
} // values()
/**
* an usage example method.
*/
public static void main(String[] args) {
IndexedMap<String,String> imap = new ArrayHashMap<String, String>();
for(int i = 0; i < args.length-1; i+=2) {
imap.put(args[i], args[i+1]);
}
System.out.println(imap.values());
System.out.println(imap.keyList());
System.out.println(imap.entryList());
System.out.println(imap);
System.out.println(imap.getKey(0));
System.out.println(imap.getValue(0));
}
}
Here the interface:
package de.fencing_game.paul.examples;
import java.util.*;
/**
* A map which additionally to key-based access allows index-based access
* to keys and values.
* <p>
* Inspired by the question Is there a container which effectively combines HashMap and ArrayList? on Stackoverflow.
* </p>
* #author Paŭlo Ebermann
* #see ArrayHashMap
*/
public interface IndexedMap<K,V>
extends Map<K,V>
{
/**
* returns a list view of the {#link #entrySet} of this Map.
*
* This list view supports removal of entries, if the map is mutable.
*
* It may also support indexed addition of new entries per the
* {#link List#add add} method - but this throws an
* {#link IllegalArgumentException} if the key is already used.
*/
public List<Map.Entry<K,V>> entryList();
/**
* returns a list view of the {#link #keySet}.
*
* This list view supports removal of keys (with the corresponding
* values), but does not support addition of new keys.
*/
public List<K> keyList();
/**
* returns a list view of values contained in this map.
*
* This list view supports removal of values (with the corresponding
* keys), but does not support addition of new values.
* It may support the {#link List#set set} operation to change the
* value for a key.
*/
public List<V> values();
/**
* Returns a value of this map by index.
*
* This is equivalent to
* {# #values() values()}.{#link List#get get}{#code (index)}.
*/
public V getValue(int index);
/**
* Returns a key of this map by index.
*
* This is equivalent to
* {# #keyList keyList()}.{#link List#get get}{#code (index)}.
*/
public K getKey(int index);
}

Why don't you simply keep the HashMap and then use hashMap.entrySet().toArray(); as suggested here?

You could do it yourself, but here's an example implemenation. The corresponding google searcht term would be "ArrayMap".
I'm not sure but maybe commons collections or google collections has such a map.
Edit:
You can create a hashmap that is implemented using an arraylist, i.e. it would work like LinkedHashMap in that the insertion order defines the list index. This would provide fast get(Index) (O(1)) and get(Name) (O(1)) access, insertion would be O(1) as well (unless the array must be extended), but deletion would be O(n) since deleting the first element would require all indices to be updated.
The trick could be done by a map that internally holds a Map for key-> index and then an ArrayList.
get(Key) would then be (simple example without error checking):
list.get(keyIndexMap.get(key));

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.