As stated in the topic, how can I check if Hashtable is full (if one could do that at all)?
Having:
HashMap<Integer, Person> p = new HashMap <>();
I imagine one needs to use e.g.
if (p.size()>p "capacity"==true)
I found somewhere that hashtables are created with a default size 11 and their capacity is automatically increased if needed... So in the end, can Hashtable ever be full?
Hashtables are created with a default size 11
That is not the size of HashTable, it's the number of hash buckets it has.
Obviously, a table with 11 hash buckets can hold less than 11 items. Perhaps less obviously, a table with 11 buckets may also hold more than 11 items, depending on collision resolution in use.
can Hashtable ever be full?
This depends on the implementation. Hash tables that use separate chaining, such as Java's HashMap, cannot get full, even if all their buckets are exhausted, because we can continue adding items to individual chains of each bucket. However, using too few hash buckets leads to significant loss of performance.
On the other hand, hash tables with linear probing, such as Java's IdentityHashMap (which is not, strictly speaking, a valid hash-based container), can get full when you run out of buckets.
HashMap has a maximum capacity of 1073741824 elements, theoretically
from the source code of HashMap
/**
* The maximum capacity, used if a higher value is implicitly specified
* by either of the constructors with arguments.
* MUST be a power of two <= 1<<30.
*/
static final int MAXIMUM_CAPACITY = 1 << 30;
But here it is limited to the number of elements a managed array (used for the backing array) can hold in Java. The JVM might fail with Out of Memory Error when you try to allocate big arrays.
That said, if the HashMap is really awful ( too many populated buckets), the HashMap wouldn't need to allocate or reallocate big arrays because key are not well distributed, it would be allocating more TreeMap or Lists nodes depending on the nature of the keys.
The capacity argument provides a hint to the implementation of an initial size for its internal table. This can save a few internal resizes.
However, a HashMap won't stop accepting put()s unless the JVM encounters an OutOfMemoryError.
Under the covers, a hashmap is an array. Hashes are used as array indices. Each array element is a reference to a linked list of Entry objects. The linked list can be arbitrarily long.
Related
I have data that I want to lookup by key.
My particular use case is that the data (key/value and number of elements) does not change once the map is initialised. All key/value values are known at once.
I have generally use a HashMap for this with default constructor (default initial capacity and load factor).
What is the best way build this Map? If I was to use HashMap, what should the default initial capacity and load factor be set to? Is Map.copyOf() a better solution? Does the size of the map matter (20 elements vs 140,000)?
This article https://docs.oracle.com/en/java/javase/15/core/creating-immutable-lists-sets-and-maps.html#GUID-6A9BAE41-A1AD-4AA1-AF1A-A8FC99A14199 seems to imply that non mutable Map returned by Map.copyOf() is more space efficient.
HashMap is fairly close to optimal in most cases already. The array of buckets doubles in capacity each time, so it's most wasteful when you have (2^N) + 1 items, since the capacity will necessarily be 2^(N+1) (i.e. 2049 items require capacity of 4096, but 2048 items fit perfectly).
In your case, specifying an initial size will only prevent a few reallocations when the map is created, which if it only happens once probably isn't relevant. Load factor is not relevant because the map's capacity will never change. In any case, if you did want to pre-size, this would be correct:
new HashMap<>(numItems, 1);
Does the size of the map matter (20 elements vs 140,000)?
It will have an impact, but not a massive one. Items are grouped into buckets, and buckets are structured as lists or trees. So the performance is mostly dependent on how many items are in a given bucket, rather than the total number of items across all buckets.
What's important is how evenly distributed across your buckets the items are. A bad hash code implementation will result in clustering. Clustering will start to move O(1) operations towards O(log n), I believe.
// The worst possible hashCode impl.
#Override
public int hashCode() { return 0; } // or any other constant
If you have the same items in the map across multiple invocations of your application (not clear from the question if that's the case), and if the class of the key is under your control, then you have the luxury of being able to tweak the hashCode implementation to positively affect the distribution, e.g. by using different prime numbers as a modulus. This would be trial and error, though, and is really only a micro-optimization.
As for the comments/answers addressing how to confer immutability, I'd argue that that's a separate concern. First work out what map is actually optimal, then worry about how to confer immutability upon it, if it isn't already. You can always wrap a mutable map in Collections.unmodifiableMap. Supposedly Guava's ImmutableMap is slower than HashMap, and I suspect other immutable variants will struggle to exceed the performance of HashMap too.
They say HashMap's put and get operations have a constant time complexity, is it still going to be O(1) if it is implemented with a dynamic array?
ex:
public class HashMap <key, value>{
private class Entry <k, v>{
private k key;
private v value;
public Entry(k key, v value){
this.key = key;
this.value = value;
}
}
private ArrayList < LinkedList<Entry<key, value>> > = new ArrayList<>();
// the rest of the implementation
// ...
}
HashMap already uses a dynamic array:
/**
* The table, initialized on first use, and resized as
* necessary. When allocated, length is always a power of two.
* (We also tolerate length zero in some operations to allow
* bootstrapping mechanics that are currently not needed.)
*/
transient Node<K,V>[] table;
Using an ArrayList instead of a manually resized array does not increase the time complexity.
The issue for puts is how often you're going to need to extend the ArrayList.
This is probably little different in overhead to extending a plain array: you have to allocate a new one and rehash.
Note you'll need to know the intended ArrayList size in order to compute the hash index (as hash code % array size), so you should allocate the ArrayList with that capacity initially and then populate with nulls - since the array elements don't exist until added to the list.
Similarly for when you rehash.
You can of course do it wrong: you could compute a size for use in computing the hash index, and then not extend the ArrayList accordingly. Then you'd suffer from arbitrary need to extend the ArrayList whenever you had an index higher than one you'd seen before, which may require reallocation internal to the ArrayList.
In short: there's no significant performance penalty for using an ArrayList if you do it in a reasonable way, but no particular benefit either.
What is the time complexity of a HashMap implemented using dynamic array?
The short answer is: It depends on how you actually implement the put and get methods, and the rehashing. However, assuming that you have gotten it right, then the complexity would be the same as with classic arrays.
Note that a typical hash table implementation will not benefit from using a dynamic array (aka a List).
In between resizes, a hash table has an array whose size is fixed. Entries are added and removed from buckets, but the number of buckets and their positions in the array do not change. Since the code is not changing the array size, there is no benefit in using dynamic arrays in between the resizes.
When the hash table resizes, it needs to create a new array (typically about twice the size of the current one). Then it recompute the entry -> bucket mappings and redistributes the entries. Note that a new array is required, since it is not feasible to redistribute the entries "in place". Secondly, the size of the new array will be known (and fixed) when it is allocated. So again there is no benefit here from using a dynamic array.
Add to this that for all primitive operations on an array, the equivalent operations for a dynamic array (i.e. ArrayList1) have a small performance overhead. So there will be a small performance hit from using a dynamic array in a hash table implementation.
Finally, you need to be careful when talking about complexity of hash table implementations. The average complexity of HashMap.put (and similar) is O(1) amortized.
A single put operation may be O(N) if it triggers a resize.
If the hash function is pathological, all operations can be O(N).
If you choose an inappropriate load factor, performance will suffer.
If you implement an incorrect resizing policy then performance will suffer.
(Amortized means averaged over all similar operations on the table. For example, N insertions with different keys into an empty table is O(N) ... or O(1) amortized per insertion.)
In short: the complexity of hash tables is ... complex.
1 - a Vector will be slightly worse than an ArrayList, but LinkedList would actually make the complexity for get and put O(N) instead of O(1). I'll leave you to figure out the details.
What are the main differences between an ArrayList and an ArrayMap? Which one is more efficient and more faster for non-threaded applications?
Documents say ArrayMap is a generic key->value mapping data structure, So what is the differences between ArrayMap and HashMap, are both same ?
ArrayMap keeps its mappings in an array data structure — an integer array of hash codes for each item, and an Object array of the key -> value pairs.
Where ArrayList is a List. Resizable-array implementation of the List interface. Implements all optional list operations, and permits all elements, including null.
FYI
ArrayMap is a generic key->value mapping data structure that is designed to be more memory efficient than a traditional HashMap.
Note that ArrayMap implementation is not intended to be appropriate for
data structures that may contain large numbers of items. It is
generally slower than a traditional HashMap, since lookups require a
binary search and adds and removes require inserting and deleting
entries in the array. For containers holding up to hundreds of items,
the performance difference is not significant, less than 50%.
FROM DOCS
ArrayList
The ArrayList class extends AbstractList and implements the List interface. ArrayList supports dynamic arrays that can grow as needed.
Array lists are created with an initial size. When this size is exceeded, the collection is automatically enlarged. When objects are removed, the array may be shrunk.
Resizable-array implementation of the List interface. Implements all optional list operations, and permits all elements, including null. In addition to implementing the List interface, this class provides methods to manipulate the size of the array that is used internally to store the list. (This class is roughly equivalent to Vector, except that it is unsynchronized.)
The size, isEmpty, get, set, iterator, and listIterator operations run in constant time. The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking). The constant factor is low compared to that for the LinkedList implementation.
Each ArrayList instance has a capacity. The capacity is the size of the array used to store the elements in the list. It is always at least as large as the list size. As elements are added to an ArrayList, its capacity grows automatically. The details of the growth policy are not specified beyond the fact that adding an element has constant amortized time cost.
ArrayMap
ArrayMap is a generic key->value mapping data structure that is designed to be more memory efficient than a traditional HashMap, this implementation is a version of the platform's android.util.ArrayMap that can be used on older versions of the platform. It keeps its mappings in an array data structure -- an integer array of hash codes for each item, and an Object array of the key/value pairs. This allows it to avoid having to create an extra object for every entry put in to the map, and it also tries to control the growth of the size of these arrays more aggressively (since growing them only requires copying the entries in the array, not rebuilding a hash map).
Read Arraymap vs Hashmap
Why does HashMap in java internally use array to store Entry Objects and not an ArrayList ?
The reason for this is highly likely that HashMap needs to control how its internal table is resized according to the number of entries and the given loadFactor.
Since ArrayList doesn't expose methods to resize its internal array to specific sizes (HashMap uses powers of 2 for its size to optimize rehashing, but ArrayList multiplies capacity by 1.5), it simply wasn't an option to be considered.
Also, even if ArrayList did increase capacity in the same way, relying on this internal detail would tie these two classes together, leaving no room to change the internal implementation of ArrayList at a later date as it could break HashMap or at the very least make it less memory efficient.
Is there a theoretical limit for the number of key entries that can be stored in a HashMap or does the maximum purely depend on the heap memory available?
Also, which data structure is the best to store a very large number of objects (say several hundred thousand objects)?
Is there a theoretical limit for the number of key entries that can be stored in a HashMap or does it purely depend on the heapmemory available ?
Looking at the documentation of that class, I would say that the theoretical limit is Integer.MAX_VALUE (231-1 = 2147483647) elements.
This is because to properly implement this class, the size() method is obliged to return an int representing the number of key/value pairs.
From the documentation of HashMap.size()
Returns: the number of key-value mappings in this map
Note: This question is very similar to How many data a list can hold at the maximum.
which data structure is the best to store a very large number of objects(say several hundred thousand objects)?
I would say it depends on what you need to store and what type of access you require. All built in collections are probably well optimized for large quantities.
HashMap holds the values in an array, which can hold up to Integer.MAX_VALUE. But this does not count collisions. Each Entry has a next field, which is also an entry. This is how collisions (two or more objects with the same hashcode) are resolved. So I wouldn't say there is any limit (apart from the available memory)
Note that if you exceed Integer.MAX_VALUE, you'll get unexpected behaviour from some methods, like size(), but get() and put() will still work. And they will work, because the hashCode() of any object will return an int, hence by definition each object will fit in the map. And then each object will collide with an existing one.
There is no theoretical limit, but there is a limit of buckets to store different entry chains (stored under a different hashkey). Once you reach this limit every new addition will result in a hash collision -- but this is no a problem except for performance...
I agree with #Bozho's and will also add that you should read the Javadoc on HashMap carefully. Note how it discusses the initial capacity and load factor and how they'll affect the performance of the HashMap.
HashMap is perfectly fine for holding large sets of data (as long as you don't run out of keys or memory) but performance can be an issue.
You may need to look in to distributed caches/data grids if you find you can't manipulate the datasets you need in a single Java/JVM program.