They say HashMap's put and get operations have a constant time complexity, is it still going to be O(1) if it is implemented with a dynamic array?
ex:
public class HashMap <key, value>{
private class Entry <k, v>{
private k key;
private v value;
public Entry(k key, v value){
this.key = key;
this.value = value;
}
}
private ArrayList < LinkedList<Entry<key, value>> > = new ArrayList<>();
// the rest of the implementation
// ...
}
HashMap already uses a dynamic array:
/**
* The table, initialized on first use, and resized as
* necessary. When allocated, length is always a power of two.
* (We also tolerate length zero in some operations to allow
* bootstrapping mechanics that are currently not needed.)
*/
transient Node<K,V>[] table;
Using an ArrayList instead of a manually resized array does not increase the time complexity.
The issue for puts is how often you're going to need to extend the ArrayList.
This is probably little different in overhead to extending a plain array: you have to allocate a new one and rehash.
Note you'll need to know the intended ArrayList size in order to compute the hash index (as hash code % array size), so you should allocate the ArrayList with that capacity initially and then populate with nulls - since the array elements don't exist until added to the list.
Similarly for when you rehash.
You can of course do it wrong: you could compute a size for use in computing the hash index, and then not extend the ArrayList accordingly. Then you'd suffer from arbitrary need to extend the ArrayList whenever you had an index higher than one you'd seen before, which may require reallocation internal to the ArrayList.
In short: there's no significant performance penalty for using an ArrayList if you do it in a reasonable way, but no particular benefit either.
What is the time complexity of a HashMap implemented using dynamic array?
The short answer is: It depends on how you actually implement the put and get methods, and the rehashing. However, assuming that you have gotten it right, then the complexity would be the same as with classic arrays.
Note that a typical hash table implementation will not benefit from using a dynamic array (aka a List).
In between resizes, a hash table has an array whose size is fixed. Entries are added and removed from buckets, but the number of buckets and their positions in the array do not change. Since the code is not changing the array size, there is no benefit in using dynamic arrays in between the resizes.
When the hash table resizes, it needs to create a new array (typically about twice the size of the current one). Then it recompute the entry -> bucket mappings and redistributes the entries. Note that a new array is required, since it is not feasible to redistribute the entries "in place". Secondly, the size of the new array will be known (and fixed) when it is allocated. So again there is no benefit here from using a dynamic array.
Add to this that for all primitive operations on an array, the equivalent operations for a dynamic array (i.e. ArrayList1) have a small performance overhead. So there will be a small performance hit from using a dynamic array in a hash table implementation.
Finally, you need to be careful when talking about complexity of hash table implementations. The average complexity of HashMap.put (and similar) is O(1) amortized.
A single put operation may be O(N) if it triggers a resize.
If the hash function is pathological, all operations can be O(N).
If you choose an inappropriate load factor, performance will suffer.
If you implement an incorrect resizing policy then performance will suffer.
(Amortized means averaged over all similar operations on the table. For example, N insertions with different keys into an empty table is O(N) ... or O(1) amortized per insertion.)
In short: the complexity of hash tables is ... complex.
1 - a Vector will be slightly worse than an ArrayList, but LinkedList would actually make the complexity for get and put O(N) instead of O(1). I'll leave you to figure out the details.
Related
I have data that I want to lookup by key.
My particular use case is that the data (key/value and number of elements) does not change once the map is initialised. All key/value values are known at once.
I have generally use a HashMap for this with default constructor (default initial capacity and load factor).
What is the best way build this Map? If I was to use HashMap, what should the default initial capacity and load factor be set to? Is Map.copyOf() a better solution? Does the size of the map matter (20 elements vs 140,000)?
This article https://docs.oracle.com/en/java/javase/15/core/creating-immutable-lists-sets-and-maps.html#GUID-6A9BAE41-A1AD-4AA1-AF1A-A8FC99A14199 seems to imply that non mutable Map returned by Map.copyOf() is more space efficient.
HashMap is fairly close to optimal in most cases already. The array of buckets doubles in capacity each time, so it's most wasteful when you have (2^N) + 1 items, since the capacity will necessarily be 2^(N+1) (i.e. 2049 items require capacity of 4096, but 2048 items fit perfectly).
In your case, specifying an initial size will only prevent a few reallocations when the map is created, which if it only happens once probably isn't relevant. Load factor is not relevant because the map's capacity will never change. In any case, if you did want to pre-size, this would be correct:
new HashMap<>(numItems, 1);
Does the size of the map matter (20 elements vs 140,000)?
It will have an impact, but not a massive one. Items are grouped into buckets, and buckets are structured as lists or trees. So the performance is mostly dependent on how many items are in a given bucket, rather than the total number of items across all buckets.
What's important is how evenly distributed across your buckets the items are. A bad hash code implementation will result in clustering. Clustering will start to move O(1) operations towards O(log n), I believe.
// The worst possible hashCode impl.
#Override
public int hashCode() { return 0; } // or any other constant
If you have the same items in the map across multiple invocations of your application (not clear from the question if that's the case), and if the class of the key is under your control, then you have the luxury of being able to tweak the hashCode implementation to positively affect the distribution, e.g. by using different prime numbers as a modulus. This would be trial and error, though, and is really only a micro-optimization.
As for the comments/answers addressing how to confer immutability, I'd argue that that's a separate concern. First work out what map is actually optimal, then worry about how to confer immutability upon it, if it isn't already. You can always wrap a mutable map in Collections.unmodifiableMap. Supposedly Guava's ImmutableMap is slower than HashMap, and I suspect other immutable variants will struggle to exceed the performance of HashMap too.
If hashtables/maps with closed hashing are worst-case O(n), are HashSets also going to require O(n) time for lookup, or is it constant time?
When looking up an element in a HashMap, it performs an O(1) calculation to find the right bucket, and then iterates over the items there serially until it finds the one the is equal to the requested key, or all the items are checked.
In the worst case scenario, all the items in the map have the same hash code and are therefore stored in the same bucket. In this case, you'll need to iterate over all of them serially, which would be an O(n) operation.
A HashSet is just a HashMap where you don't care about the values, only the keys - under the hood, it's a HashMap where all the values are a dummy Object.
If you look at the implementation of a HashSet (e.g. from OpenJDK 8: https://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/HashSet.java), you can see that it's actually just built on top of a HashMap. Relevant code snippet here:
public class HashSet<E>
extends AbstractSet<E>
implements Set<E>, Cloneable, java.io.Serializable
{
private transient HashMap<E,Object> map;
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
/**
* Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
* default initial capacity (16) and load factor (0.75).
*/
public HashSet() {
map = new HashMap<>();
}
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
The HashSet attempts to slightly optimize the memory usage by creating a single static empty Object value named PRESENT and just using that as the value part of every key/value entry into the HashMap.
So whatever the performance implications are of using a HashMap, a HashSet will have more or less the same ones since it's literally using a HashMap under the covers.
To directly answer your question: in the worst case, yes, just as a the worse case complexity of a HashMap is O(n), so too the worst case complexity of a HashSet is O(n).
It is worth noting that, unless you have a really bad hash function or are using a hashtable of a ridiculously small size, you're very unlikely to see the worst case performance in practice. You'd have to have every element hash to the exact same bucket in the hashtable so the performance would essentially degrade to a linked list traversal (assuming a hashtable using chaining for collision handling, which the Java ones do).
Worst case is O(N) as mentioned, average and amortized run time is constant.
From GeeksForGeeks:
The underlying data structure for HashSet is hashtable. So amortize (average or usual case) time complexity for add, remove and look-up (contains method) operation of HashSet takes O(1) time.
I see a lot of people saying the worst case is O(n). This is because the old HashSet implementation used to use a LinkedList to handle collisions to the same bucket. However, that is not a definitive answer.
In java 8 such LinkedList is replaced by a balanced binary tree when the number of collisions of a bucket grows. This improves the worst-case performance from O(n) to O(log n) for lookups.
You can check additional details here.
http://openjdk.java.net/jeps/180
https://www.nagarro.com/en/blog/post/24/performance-improvement-for-hashmap-in-java-8
What are the main differences between an ArrayList and an ArrayMap? Which one is more efficient and more faster for non-threaded applications?
Documents say ArrayMap is a generic key->value mapping data structure, So what is the differences between ArrayMap and HashMap, are both same ?
ArrayMap keeps its mappings in an array data structure — an integer array of hash codes for each item, and an Object array of the key -> value pairs.
Where ArrayList is a List. Resizable-array implementation of the List interface. Implements all optional list operations, and permits all elements, including null.
FYI
ArrayMap is a generic key->value mapping data structure that is designed to be more memory efficient than a traditional HashMap.
Note that ArrayMap implementation is not intended to be appropriate for
data structures that may contain large numbers of items. It is
generally slower than a traditional HashMap, since lookups require a
binary search and adds and removes require inserting and deleting
entries in the array. For containers holding up to hundreds of items,
the performance difference is not significant, less than 50%.
FROM DOCS
ArrayList
The ArrayList class extends AbstractList and implements the List interface. ArrayList supports dynamic arrays that can grow as needed.
Array lists are created with an initial size. When this size is exceeded, the collection is automatically enlarged. When objects are removed, the array may be shrunk.
Resizable-array implementation of the List interface. Implements all optional list operations, and permits all elements, including null. In addition to implementing the List interface, this class provides methods to manipulate the size of the array that is used internally to store the list. (This class is roughly equivalent to Vector, except that it is unsynchronized.)
The size, isEmpty, get, set, iterator, and listIterator operations run in constant time. The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking). The constant factor is low compared to that for the LinkedList implementation.
Each ArrayList instance has a capacity. The capacity is the size of the array used to store the elements in the list. It is always at least as large as the list size. As elements are added to an ArrayList, its capacity grows automatically. The details of the growth policy are not specified beyond the fact that adding an element has constant amortized time cost.
ArrayMap
ArrayMap is a generic key->value mapping data structure that is designed to be more memory efficient than a traditional HashMap, this implementation is a version of the platform's android.util.ArrayMap that can be used on older versions of the platform. It keeps its mappings in an array data structure -- an integer array of hash codes for each item, and an Object array of the key/value pairs. This allows it to avoid having to create an extra object for every entry put in to the map, and it also tries to control the growth of the size of these arrays more aggressively (since growing them only requires copying the entries in the array, not rebuilding a hash map).
Read Arraymap vs Hashmap
As stated in the topic, how can I check if Hashtable is full (if one could do that at all)?
Having:
HashMap<Integer, Person> p = new HashMap <>();
I imagine one needs to use e.g.
if (p.size()>p "capacity"==true)
I found somewhere that hashtables are created with a default size 11 and their capacity is automatically increased if needed... So in the end, can Hashtable ever be full?
Hashtables are created with a default size 11
That is not the size of HashTable, it's the number of hash buckets it has.
Obviously, a table with 11 hash buckets can hold less than 11 items. Perhaps less obviously, a table with 11 buckets may also hold more than 11 items, depending on collision resolution in use.
can Hashtable ever be full?
This depends on the implementation. Hash tables that use separate chaining, such as Java's HashMap, cannot get full, even if all their buckets are exhausted, because we can continue adding items to individual chains of each bucket. However, using too few hash buckets leads to significant loss of performance.
On the other hand, hash tables with linear probing, such as Java's IdentityHashMap (which is not, strictly speaking, a valid hash-based container), can get full when you run out of buckets.
HashMap has a maximum capacity of 1073741824 elements, theoretically
from the source code of HashMap
/**
* The maximum capacity, used if a higher value is implicitly specified
* by either of the constructors with arguments.
* MUST be a power of two <= 1<<30.
*/
static final int MAXIMUM_CAPACITY = 1 << 30;
But here it is limited to the number of elements a managed array (used for the backing array) can hold in Java. The JVM might fail with Out of Memory Error when you try to allocate big arrays.
That said, if the HashMap is really awful ( too many populated buckets), the HashMap wouldn't need to allocate or reallocate big arrays because key are not well distributed, it would be allocating more TreeMap or Lists nodes depending on the nature of the keys.
The capacity argument provides a hint to the implementation of an initial size for its internal table. This can save a few internal resizes.
However, a HashMap won't stop accepting put()s unless the JVM encounters an OutOfMemoryError.
Under the covers, a hashmap is an array. Hashes are used as array indices. Each array element is a reference to a linked list of Entry objects. The linked list can be arbitrarily long.
i thought linkedlists were supposed to be faster than an arraylist when adding elements? i just did a test of how long it takes to add, sort, and search for elements (arraylist vs linkedlist vs hashset). i was just using the java.util classes for arraylist and linkedlist...using both of the add(object) methods available to each class.
arraylist out performed linkedlist in filling the list...and in a linear search of the list.
is this right? did i do something wrong in the implementation maybe?
***************EDIT*****************
i just want to make sure i'm using these things right. here's what i'm doing:
public class LinkedListTest {
private List<String> Names;
public LinkedListTest(){
Names = new LinkedList<String>();
}
Then I just using linkedlist methods ie "Names.add(strings)". And when I tested arraylists, it's nearly identical:
public class ArrayListTest {
private List<String> Names;
public ArrayListTest(){
Names = new ArrayList<String>();
}
Am I doing it right?
Yes, that's right. LinkedList will have to do a memory allocation on each insertion, while ArrayList is permitted to do fewer of them, giving it amortized O(1) insertion. Memory allocation looks cheap, but may be actually be very expensive.
The linear search time is likely slower in LinkedList due to locality of reference: the ArrayList elements are closer together, so there are fewer cache misses.
When you plan to insert only at the end of a List, ArrayList is the implementation of choice.
Remember that:
there's a difference in "raw" performance for a given number of elements, and in how different structures scale;
different structures perform differently at different operations, and that's essentially part of what you need to take into account in choosing which structure to use.
So, for example, a linked list has more to do in adding to the end, because it has an additional object to allocate and initialise per item added, but whatever that "intrinsic" cost per item, both structures will have O(1) performance for adding to the end of the list, i.e. have an effectively "constant" time per addition whatever the size of the list, but that constant will be different between ArrayList vs LinkedList and likely to be greater for the latter.
On the other hand, a linked list has constant time for adding to the beginning of the list, whereas in the case of an ArrayList, the elements must be "shuftied" along, an operation that takes some time proportional to the number of elements. But, for a given list size, say, 100 elements, it may still be quicker to "shufty" 100 elements than it is to allocate and initialise a single placeholder object of the linked list (but by the time you get to, say, a thousand or a million objects or whatever the threshold is, it won't be).
So in your testing, you probably want to consider both the "raw" time of the operations at a given size and how these operations scale as the list size grows.
Why did you think LinkedList would be faster? In the general case, an insert into an array list is simply a case of updating the pointer for a single array cell (with O(1) random access). The LinkedList insert is also random access, but must allocate an "cell" object to hold the entry, and update a pair of pointers, as well as ultimately setting the reference to the object being inserted.
Of course, periodically the ArrayList's backing array may need to be resized (which won't be the case if it was chosen with a large enough initial capacity), but since the array grows exponentially the amortized cost will be low, and is bounded by O(lg n) complexity.
Simply put - inserts into array lists are much simpler and therefore much faster overall.
Linked list may be slower than array list in these cases for a few reasons. If you are inserting into the end of the list, it is likely that the array list has this space already allocated. The underlying array is usually increased in large chunks, because this is a very time-consuming process. So, in most cases, to add an element in the back requires only sticking in a reference, whereas the linked list needs the creation of a node. Adding in the front and the middle should give different performance in for both types of list.
Linear traversal of the list will always be faster in an array based list because it must only traverse the array normally. This requires one dereferencing operation per cell. In the linked list, the nodes of the list must also be dereferenced, taking double the amount of time.
When adding an element to the back of a LinkedList (in Java LinkedList is actually a doubly linked list) it is an O(1) operation as is adding an element to the front of it. Adding an element on the ith position is roughly an O(i) operation.
So, if you were adding to the front of the list, a LinkedList would be significantly faster.
ArrayList is faster in accessing random index data, but slower when inserting elements in the middle of the list, because using linked list you just have to change reference values. But in an array list you have to copy all elements after the inserted index, one index behind.
EDIT: Is not there a linkedlist implementation which keeps the last element in mind? Doing it this way would speed up inserting at the end using linked list.