Difference between Bucket and Node in HashMap [duplicate] - java

Recently, in an interview I was asked, what exactly is a bucket in hashmap? Whether it is an array or a arraylist or what?
I got confused. I know hashmaps are backed by arrays. So can I say that bucket is an array with a capacity of 16 in the start storing hashcodes and to which linked lists have their start pointer ?
I know how a hashmap internally works, just wanted to know what exactly is a bucket in terms of data structures.

No, a bucket is each element in the array you are referring to. In earlier Java versions, each bucket contained a linked list of Map entries. In new Java versions, each bucket contains either a tree structure of entries or a linked list of entries.
From the implementation notes in Java 8:
/*
* Implementation notes.
*
* This map usually acts as a binned (bucketed) hash table, but
* when bins get too large, they are transformed into bins of
* TreeNodes, each structured similarly to those in
* java.util.TreeMap. Most methods try to use normal bins, but
* relay to TreeNode methods when applicable (simply by checking
* instanceof a node). Bins of TreeNodes may be traversed and
* used like any others, but additionally support faster lookup
* when overpopulated. However, since the vast majority of bins in
* normal use are not overpopulated, checking for existence of
* tree bins may be delayed in the course of table methods.
...

I hope this may help you to understand the implementation of hash map well.

Buckets exactly is an array of Nodes. So single bucket is an instance of class java.util.HashMap.Node. Each Node is a data structure similar to LinkedList, or may be like a TreeMap (since Java 8), HashMap decides itself what is better for performance--keep buckets as LinkedList or TreeMap. TreeMap will be only chosen in case of poorly designed hashCode() function, when lots of entries will be placed in single bucket.
See how buckets look like in HashMap:
/**
* The table, initialized on first use, and resized as
* necessary. When allocated, length is always a power of two.
* (We also tolerate length zero in some operations to allow
* bootstrapping mechanics that are currently not needed.)
*/
transient Node<K,V>[] table;

Hashmap Bucket is where multiple nodes can store and nodes where hashmap object store based on index calculation and every nodes connected based on linkedlist architecture.

Buckets are basically a data structure that is being used in the Paging algorithm of the Operating System . To be in a very Laymans language.
The objects representing a particular hashcode is being stored in that bucket.(basically you can consider the header of the linked list data structure to be the hashcode value which is represented in the terms of bucket)
The references of the object is being stored in the link list , whose header represents the value of the Hashcode.
The JVM creates them and the size, depends upon the memory being allocated by the JVM.

Related

HashMap has backing array, then why is it unordered

I read that HashMap has a backing array, where entries are stored (marked with bucket number, initial size 16). Arrays are ordered, and I can call get(n) to get the element at nth position. Then why is HashMap unordered and has no get(n) method?
It depends on your view of what ordered means.
Indeed HashMapss internally use an array or another collection that has a fixed ordering. However the order has nothing to do with insertion order or something like that. The elements are ordered, for example, in increasing size of their hash-values and they have nothing to do with some actual ordering on the elements themselves.
So HashMaps indeed have something like a get(n) method if you think of n being the hash-value of the key-element. The method is called get(*key*) and it first computes the hash-value of the given key-element and then looks the value up on the internal structure by using get(*hash-value*) on it.
Here is an image a quick search yield that shows the structure of HashSets:
Note that HashSets are kinda the same than HashMaps, they use the same technique and the same image applies. But instead of just inserting an element a map inserts a container that is identified by the key and additionally holds a value.
Just as a small overview. A hash-function is a function that given an object computes a small value, the hash-value out of it, using its properties. The computation usually can be done fast and a lookup on the internal array at the position given by the hash-value is thus also fast.
To your specific question, as an user of a HashMap you generally are not interested in what elements specifically hide behind hash-value 1 or 2 and so on, that is why they did not include such a method. However if you truly need to do that for a special application or so than you can always try to use Reflection to access the internals of your HashMap or you could also just write a small wrapper around the class that provides such a method.
A HashMap is divided into individual buckets. Buckets are initially backed by an array, however if the buckets get too large then they are converted to tree structures which are sorted based on hash codes. That fact alone destroys any guarantee it could make about preserving insertion order.
If you'd like to know more about how it's implemented, you can look at my answer to this question: HashMap Java 8 implementation

check if Hashtable is full java

As stated in the topic, how can I check if Hashtable is full (if one could do that at all)?
Having:
HashMap<Integer, Person> p = new HashMap <>();
I imagine one needs to use e.g.
if (p.size()>p "capacity"==true)
I found somewhere that hashtables are created with a default size 11 and their capacity is automatically increased if needed... So in the end, can Hashtable ever be full?
Hashtables are created with a default size 11
That is not the size of HashTable, it's the number of hash buckets it has.
Obviously, a table with 11 hash buckets can hold less than 11 items. Perhaps less obviously, a table with 11 buckets may also hold more than 11 items, depending on collision resolution in use.
can Hashtable ever be full?
This depends on the implementation. Hash tables that use separate chaining, such as Java's HashMap, cannot get full, even if all their buckets are exhausted, because we can continue adding items to individual chains of each bucket. However, using too few hash buckets leads to significant loss of performance.
On the other hand, hash tables with linear probing, such as Java's IdentityHashMap (which is not, strictly speaking, a valid hash-based container), can get full when you run out of buckets.
HashMap has a maximum capacity of 1073741824 elements, theoretically
from the source code of HashMap
/**
* The maximum capacity, used if a higher value is implicitly specified
* by either of the constructors with arguments.
* MUST be a power of two <= 1<<30.
*/
static final int MAXIMUM_CAPACITY = 1 << 30;
But here it is limited to the number of elements a managed array (used for the backing array) can hold in Java. The JVM might fail with Out of Memory Error when you try to allocate big arrays.
That said, if the HashMap is really awful ( too many populated buckets), the HashMap wouldn't need to allocate or reallocate big arrays because key are not well distributed, it would be allocating more TreeMap or Lists nodes depending on the nature of the keys.
The capacity argument provides a hint to the implementation of an initial size for its internal table. This can save a few internal resizes.
However, a HashMap won't stop accepting put()s unless the JVM encounters an OutOfMemoryError.
Under the covers, a hashmap is an array. Hashes are used as array indices. Each array element is a reference to a linked list of Entry objects. The linked list can be arbitrarily long.

Which Java Collection should I use?

In this question How can I efficiently select a Standard Library container in C++11? is a handy flow chart to use when choosing C++ collections.
I thought that this was a useful resource for people who are not sure which collection they should be using so I tried to find a similar flow chart for Java and was not able to do so.
What resources and "cheat sheets" are available to help people choose the right Collection to use when programming in Java? How do people know what List, Set and Map implementations they should use?
Since I couldn't find a similar flowchart I decided to make one myself.
This flow chart does not try and cover things like synchronized access, thread safety etc or the legacy collections, but it does cover the 3 standard Sets, 3 standard Maps and 2 standard Lists.
This image was created for this answer and is licensed under a Creative Commons Attribution 4.0 International License. The simplest attribution is by linking to either this question or this answer.
Other resources
Probably the most useful other reference is the following page from the oracle documentation which describes each Collection.
HashSet vs TreeSet
There is a detailed discussion of when to use HashSet or TreeSet here:
Hashset vs Treeset
ArrayList vs LinkedList
Detailed discussion: When to use LinkedList over ArrayList?
Summary of the major non-concurrent, non-synchronized collections
Collection: An interface representing an unordered "bag" of items, called "elements". The "next" element is undefined (random).
Set: An interface representing a Collection with no duplicates.
HashSet: A Set backed by a Hashtable. Fastest and smallest memory usage, when ordering is unimportant.
LinkedHashSet: A HashSet with the addition of a linked list to associate elements in insertion order. The "next" element is the next-most-recently inserted element.
TreeSet: A Set where elements are ordered by a Comparator (typically natural ordering). Slowest and largest memory usage, but necessary for comparator-based ordering.
EnumSet: An extremely fast and efficient Set customized for a single enum type.
List: An interface representing a Collection whose elements are ordered and each have a numeric index representing its position, where zero is the first element, and (length - 1) is the last.
ArrayList: A List backed by an array, where the array has a length (called "capacity") that is at least as large as the number of elements (the list's "size"). When size exceeds capacity (when the (capacity + 1)-th element is added), the array is recreated with a new capacity of (new length * 1.5)--this recreation is fast, since it uses System.arrayCopy(). Deleting and inserting/adding elements requires all neighboring elements (to the right) be shifted into or out of that space. Accessing any element is fast, as it only requires the calculation (element-zero-address + desired-index * element-size) to find it's location. In most situations, an ArrayList is preferred over a LinkedList.
LinkedList: A List backed by a set of objects, each linked to its "previous" and "next" neighbors. A LinkedList is also a Queue and Deque. Accessing elements is done starting at the first or last element, and traversing until the desired index is reached. Insertion and deletion, once the desired index is reached via traversal is a trivial matter of re-mapping only the immediate-neighbor links to point to the new element or bypass the now-deleted element.
Map: An interface representing an Collection where each element has an identifying "key"--each element is a key-value pair.
HashMap: A Map where keys are unordered, and backed by a Hashtable.
LinkedhashMap: Keys are ordered by insertion order.
TreeMap: A Map where keys are ordered by a Comparator (typically natural ordering).
Queue: An interface that represents a Collection where elements are, typically, added to one end, and removed from the other (FIFO: first-in, first-out).
Stack: An interface that represents a Collection where elements are, typically, both added (pushed) and removed (popped) from the same end (LIFO: last-in, first-out).
Deque: Short for "double ended queue", usually pronounced "deck". A linked list that is typically only added to and read from either end (not the middle).
Basic collection diagrams:
Comparing the insertion of an element with an ArrayList and LinkedList:
Even simpler picture is here. Intentionally simplified!
Collection is anything holding data called "elements" (of the same type). Nothing more specific is assumed.
List is an indexed collection of data where each element has an index. Something like the array, but more flexible.
Data in the list keep the order of insertion.
Typical operation: get the n-th element.
Set is a bag of elements, each elements just once (the elements are distinguished using their equals() method.
Data in the set are stored mostly just to know what data are there.
Typical operation: tell if an element is present in the list.
Map is something like the List, but instead of accessing the elements by their integer index, you access them by their key, which is any object. Like the array in PHP :)
Data in Map are searchable by their key.
Typical operation: get an element by its ID (where ID is of any type, not only int as in case of List).
The differences
Set vs. Map: in Set you search data by themselves, whilst in Map by their key.
N.B. The standard library Sets are indeed implemented exactly like this: a map where the keys are the Set elements themselves, and with a dummy value.
List vs. Map: in List you access elements by their int index (position in List), whilst in Map by their key which os of any type (typically: ID)
List vs. Set: in List the elements are bound by their position and can be duplicate, whilst in Set the elements are just "present" (or not present) and are unique (in the meaning of equals(), or compareTo() for SortedSet)
It is simple: if you need to store values with keys mapped to them go for the Map interface, otherwise use List for values which may be duplicated and finally use the Set interface if you don’t want duplicated values in your collection.
Here is the complete explanation http://javatutorial.net/choose-the-right-java-collection , including flowchart etc
Map
If choosing a Map, I made this table summarizing the features of each of the ten implementations bundled with Java 11.
Common collections, Common collections

Does Java use indexes for Data structure like Oracle

When we create a Collection (ArrayList,HashMap) in Java, does Java internally create some kind of index for faster retrieval of data ? In Oracle we have to manually create indexes but what is the technique (if any) used in Java
For ArrayList, each Object has a unique index (even duplicate objects).
An object can easily be accessed by its index using ArrayList.get(). The index is based on the order Objects are added (assuming you haven't sorted the ArrayList or otherwise changed the order). When an object is removed from an ArrayList, all elements in front of it (with a larger index) are shifted to the left so that their indices become index - 1.
A HashMap uses a slightly more complex indexing scheme...
For a HashMap, all indexing information is hidden from you first of all, so you don't really need to know this unless you want to understand its internal workings (which is a good thing!) It does use indexing however... HashMaps use an array of Entrys (its own implementation of Map.Entry) to store information. Entry represents a node in a linked list (not to be confused with the object java.util.LinkedList) and it stores a key, a value, and the next node in the linked list.
The index of an entry in a HashMap is simply h & (length - 1) where h is the hashCode of the key, passed through a custom hashing method internal to the java.util package (you won't be able to access it), and length is a power-of-two integer representing the size of the array of Entrys (this will automatically grow if need be).
Of course there may be some collisions if two keys end up computing the same hash. This is why HashMap uses an array of linked lists. In case of a collision, where two Entrys have the same hash, one can be tagged to the end of the other.
To get an object in HashMap, the index is calculated from the key you provide through get(key) and the relevant Entry is retrieved from the array of Entrys. Now that the map has the first node in a linked list, it will iterate over all elements of this linked list until it finds the key equal to the key you provided.
yes.. Java does hide these implementation details. But for the performance that java provides, there may be some indexing technique internally used. When you refer to "Oracle" I believe its the SQL or database software and not a language like Java.

Java Collection - Concrete data structure under the hood for dictionary/tree based abstract data structure

My question is about what are the fundamental/concrete data structure (like array) used in implementing abstract data structure implementations like variations maps/trees?
I'm looking for what's used really in java collection, not theoretical answers.
Based on quick code review of Sun/Oracle JDK. You can easily find the details yourself.
Lists/queues
ArrayList
Growing Object[] elementData field. Can hold 10 elements by default, grows by around 50% when cannot hold more objects, copying the old array to a bigger new one. Does not shrink when removing items.
LinkedList
Reference to Entry which in turns hold reference to actual element, previous and next element (if any).
ArrayDeque
Similar to ArrayList but also holding two pointers to internal E[] elements array - head and tail. Both adding and removing elements on either side is just a matter of moving these pointers. The array grows by 200% when is too small.
Maps
HashMap
Growing Entry[] table field holding so called buckets. Each bucket contains linked list of entries having the same hash of the key module table size.
TreeMap
Entry<K,V> root reference holding the root of the red-black balanced tree.
ConcurrentHashMap
Similar to HashMap but access to each bucket (called segment) is synchronized by an independent lock.
Sets
TreeSet
Uses TreeMap underneath (!)
HashSet
Uses HashMap underneath (!)
BitSet
Uses long[] words field to be as memory efficient as possible. Up to 64 bits can be stored in one element.
There is of course one answer for each implementation. Look at the javadocs, they often describe these things. http://docs.oracle.com/javase/7/docs/api/

Categories

Resources