HashMap has backing array, then why is it unordered - java

I read that HashMap has a backing array, where entries are stored (marked with bucket number, initial size 16). Arrays are ordered, and I can call get(n) to get the element at nth position. Then why is HashMap unordered and has no get(n) method?

It depends on your view of what ordered means.
Indeed HashMapss internally use an array or another collection that has a fixed ordering. However the order has nothing to do with insertion order or something like that. The elements are ordered, for example, in increasing size of their hash-values and they have nothing to do with some actual ordering on the elements themselves.
So HashMaps indeed have something like a get(n) method if you think of n being the hash-value of the key-element. The method is called get(*key*) and it first computes the hash-value of the given key-element and then looks the value up on the internal structure by using get(*hash-value*) on it.
Here is an image a quick search yield that shows the structure of HashSets:
Note that HashSets are kinda the same than HashMaps, they use the same technique and the same image applies. But instead of just inserting an element a map inserts a container that is identified by the key and additionally holds a value.
Just as a small overview. A hash-function is a function that given an object computes a small value, the hash-value out of it, using its properties. The computation usually can be done fast and a lookup on the internal array at the position given by the hash-value is thus also fast.
To your specific question, as an user of a HashMap you generally are not interested in what elements specifically hide behind hash-value 1 or 2 and so on, that is why they did not include such a method. However if you truly need to do that for a special application or so than you can always try to use Reflection to access the internals of your HashMap or you could also just write a small wrapper around the class that provides such a method.

A HashMap is divided into individual buckets. Buckets are initially backed by an array, however if the buckets get too large then they are converted to tree structures which are sorted based on hash codes. That fact alone destroys any guarantee it could make about preserving insertion order.
If you'd like to know more about how it's implemented, you can look at my answer to this question: HashMap Java 8 implementation

Related

Which Java Collection should I use?

In this question How can I efficiently select a Standard Library container in C++11? is a handy flow chart to use when choosing C++ collections.
I thought that this was a useful resource for people who are not sure which collection they should be using so I tried to find a similar flow chart for Java and was not able to do so.
What resources and "cheat sheets" are available to help people choose the right Collection to use when programming in Java? How do people know what List, Set and Map implementations they should use?
Since I couldn't find a similar flowchart I decided to make one myself.
This flow chart does not try and cover things like synchronized access, thread safety etc or the legacy collections, but it does cover the 3 standard Sets, 3 standard Maps and 2 standard Lists.
This image was created for this answer and is licensed under a Creative Commons Attribution 4.0 International License. The simplest attribution is by linking to either this question or this answer.
Other resources
Probably the most useful other reference is the following page from the oracle documentation which describes each Collection.
HashSet vs TreeSet
There is a detailed discussion of when to use HashSet or TreeSet here:
Hashset vs Treeset
ArrayList vs LinkedList
Detailed discussion: When to use LinkedList over ArrayList?
Summary of the major non-concurrent, non-synchronized collections
Collection: An interface representing an unordered "bag" of items, called "elements". The "next" element is undefined (random).
Set: An interface representing a Collection with no duplicates.
HashSet: A Set backed by a Hashtable. Fastest and smallest memory usage, when ordering is unimportant.
LinkedHashSet: A HashSet with the addition of a linked list to associate elements in insertion order. The "next" element is the next-most-recently inserted element.
TreeSet: A Set where elements are ordered by a Comparator (typically natural ordering). Slowest and largest memory usage, but necessary for comparator-based ordering.
EnumSet: An extremely fast and efficient Set customized for a single enum type.
List: An interface representing a Collection whose elements are ordered and each have a numeric index representing its position, where zero is the first element, and (length - 1) is the last.
ArrayList: A List backed by an array, where the array has a length (called "capacity") that is at least as large as the number of elements (the list's "size"). When size exceeds capacity (when the (capacity + 1)-th element is added), the array is recreated with a new capacity of (new length * 1.5)--this recreation is fast, since it uses System.arrayCopy(). Deleting and inserting/adding elements requires all neighboring elements (to the right) be shifted into or out of that space. Accessing any element is fast, as it only requires the calculation (element-zero-address + desired-index * element-size) to find it's location. In most situations, an ArrayList is preferred over a LinkedList.
LinkedList: A List backed by a set of objects, each linked to its "previous" and "next" neighbors. A LinkedList is also a Queue and Deque. Accessing elements is done starting at the first or last element, and traversing until the desired index is reached. Insertion and deletion, once the desired index is reached via traversal is a trivial matter of re-mapping only the immediate-neighbor links to point to the new element or bypass the now-deleted element.
Map: An interface representing an Collection where each element has an identifying "key"--each element is a key-value pair.
HashMap: A Map where keys are unordered, and backed by a Hashtable.
LinkedhashMap: Keys are ordered by insertion order.
TreeMap: A Map where keys are ordered by a Comparator (typically natural ordering).
Queue: An interface that represents a Collection where elements are, typically, added to one end, and removed from the other (FIFO: first-in, first-out).
Stack: An interface that represents a Collection where elements are, typically, both added (pushed) and removed (popped) from the same end (LIFO: last-in, first-out).
Deque: Short for "double ended queue", usually pronounced "deck". A linked list that is typically only added to and read from either end (not the middle).
Basic collection diagrams:
Comparing the insertion of an element with an ArrayList and LinkedList:
Even simpler picture is here. Intentionally simplified!
Collection is anything holding data called "elements" (of the same type). Nothing more specific is assumed.
List is an indexed collection of data where each element has an index. Something like the array, but more flexible.
Data in the list keep the order of insertion.
Typical operation: get the n-th element.
Set is a bag of elements, each elements just once (the elements are distinguished using their equals() method.
Data in the set are stored mostly just to know what data are there.
Typical operation: tell if an element is present in the list.
Map is something like the List, but instead of accessing the elements by their integer index, you access them by their key, which is any object. Like the array in PHP :)
Data in Map are searchable by their key.
Typical operation: get an element by its ID (where ID is of any type, not only int as in case of List).
The differences
Set vs. Map: in Set you search data by themselves, whilst in Map by their key.
N.B. The standard library Sets are indeed implemented exactly like this: a map where the keys are the Set elements themselves, and with a dummy value.
List vs. Map: in List you access elements by their int index (position in List), whilst in Map by their key which os of any type (typically: ID)
List vs. Set: in List the elements are bound by their position and can be duplicate, whilst in Set the elements are just "present" (or not present) and are unique (in the meaning of equals(), or compareTo() for SortedSet)
It is simple: if you need to store values with keys mapped to them go for the Map interface, otherwise use List for values which may be duplicated and finally use the Set interface if you don’t want duplicated values in your collection.
Here is the complete explanation http://javatutorial.net/choose-the-right-java-collection , including flowchart etc
Map
If choosing a Map, I made this table summarizing the features of each of the ten implementations bundled with Java 11.
Common collections, Common collections

Does Java use indexes for Data structure like Oracle

When we create a Collection (ArrayList,HashMap) in Java, does Java internally create some kind of index for faster retrieval of data ? In Oracle we have to manually create indexes but what is the technique (if any) used in Java
For ArrayList, each Object has a unique index (even duplicate objects).
An object can easily be accessed by its index using ArrayList.get(). The index is based on the order Objects are added (assuming you haven't sorted the ArrayList or otherwise changed the order). When an object is removed from an ArrayList, all elements in front of it (with a larger index) are shifted to the left so that their indices become index - 1.
A HashMap uses a slightly more complex indexing scheme...
For a HashMap, all indexing information is hidden from you first of all, so you don't really need to know this unless you want to understand its internal workings (which is a good thing!) It does use indexing however... HashMaps use an array of Entrys (its own implementation of Map.Entry) to store information. Entry represents a node in a linked list (not to be confused with the object java.util.LinkedList) and it stores a key, a value, and the next node in the linked list.
The index of an entry in a HashMap is simply h & (length - 1) where h is the hashCode of the key, passed through a custom hashing method internal to the java.util package (you won't be able to access it), and length is a power-of-two integer representing the size of the array of Entrys (this will automatically grow if need be).
Of course there may be some collisions if two keys end up computing the same hash. This is why HashMap uses an array of linked lists. In case of a collision, where two Entrys have the same hash, one can be tagged to the end of the other.
To get an object in HashMap, the index is calculated from the key you provide through get(key) and the relevant Entry is retrieved from the array of Entrys. Now that the map has the first node in a linked list, it will iterate over all elements of this linked list until it finds the key equal to the key you provided.
yes.. Java does hide these implementation details. But for the performance that java provides, there may be some indexing technique internally used. When you refer to "Oracle" I believe its the SQL or database software and not a language like Java.

Separate chaining in Java

Our class is learning about hash tables, and one of my study questions involves me creating a dictionary using a hash table with separate chaining. However, the catch is that we are not allowed to use Java's provided methods for creating hash tables. Rather, our lecture notes mention that separate chaining involves each cell in an array pointing to a linked list of entries.
Thus, my understanding is that I should create an array of size n (where n is prime), and insert an empty linked list into each position in the array. Then, I use my hash function to hash strings and insert them into the corresponding linked list in the proper array position. I created my hash function, and so far my Dictionary constructor takes in a size and creates an array of that size (actually, of size 4999, both prime and large as discussed in class). Am I on the right track here? Should I now insert a new linked list into each position and then work on insert/remove methods?
What you have sounds good so far.
Bear in mind that an array of object references has each cell null by default, and you can write your insert and remove functions to work with that. If you choose to create a linked list object that contains no data (sometimes called a sentinel node) it may be advantageous to create a single immutable (read-only) instance to put in every empty slot, rather than create 4,999 separate instances with new (where most don't hold any data).
It sounds like you are on the right track.
Some extra pointers:
It's not worth creating a LinkedList in each bucket until it is actually used. So you can leave the buckets as null until they are added to. Just remember to write your accessor functions to take account of this.
It's not always efficient to create a large array immediately. It can be better to start with a small array, keep track of the capacity used, and enlarge the array when necessary (which involves re-bucketing the values into the new array)
It's a good idea to make your class implement the whole of the Map<K,V> interface - just to get some practice implementing the other standard Java collection methods.

Java collection insertion: Set vs. List

I'm thinking about filling a collection with a large amount of unique objects.
How is the cost of an insert in a Set (say HashSet) compared to an List (say ArrayList)?
My feeling is that duplicate elimination in sets might cause a slight overhead.
There is no "duplicate elimination" such as comparing to all existing elements. If you insert into hash set, it's really a dictionary of items by hash code. There's no duplicate checking unless there already are items with the same hash code. Given a reasonable (well-distributed) hash function, it's not that bad.
As Will has noted, because of the dictionary structure HashSet is probably a bit slower than an ArrayList (unless you want to insert "between" existing elements). It also is a bit larger. I'm not sure that's a significant difference though.
You're right: set structures are inherently more complex in order to recognize and eliminate duplicates. Whether this overhead is significant for your case should be tested with a benchmark.
Another factor is memory usage. If your objects are very small, the memory overhead introduced by the set structure can be significant. In the most extreme case (TreeSet<Integer> vs. ArrayList<Integer>) the set structure can require more than 10 times as much memory.
If you're certain your data will be unique, use a List. You can use a Set to enforce this rule.
Sets are faster than Lists if you have a large data set, while the inverse is true for smaller data sets. I haven't personally tested this claim.
Which type of List?
Also, consider which List to use. LinkedLists are faster at adding, removing elements.
ArrayLists are faster at random access (for loops, etc), but this can be worked around using the Iterator of a LinkedList. ArrayLists are are much faster at: list.toArray().
You have to compare concrete implementations (for example HashSet with ArrayList), because the abstract interfaces Set/List don't really tell you anything about performance.
Inserting into a HashSet is a pretty cheap operation, as long as the hashCode() of the object to be inserted is sane. It will still be slightly slower than ArrayList, because it's insertion is a simple insertion into an array (assuming you insert in the end and there's still free space; I don't factor in resizing the internal array, because the same cost applies to HashSet as well).
If the goal is the uniqueness of the elements, you should use an implementation of the java.util.Set interface. The class java.util.HashSet and java.util.LinkedHashSet have O(alpha) (close to O(1) in the best case) complexity for insert, delete and contains check.
ArrayList have O(n) for object (not index) contains check (you have to scroll through the whole list) and insertion (if the insertion is not in tail of the list, you have to shift the whole underline array).
You can use LinkedHashSet that preserve the order of insertion and have the same potentiality of HashSet (takes up only a bit more of memory).
I don't think you can make this judgement simply on the cost of building the collection. Other things that you need to take into account are:
Is the input dataset ordered? Is there a requirement that the output data structure preserves insertion order?
Is there a requirement that the output data structure is ordered (or reordered) based on element values?
Will the output data structure be subsequently modified? How?
Is there a requirement that the output data structure is duplicate free if other elements are added subsequently?
Do you know how many elements are likely to be in the input dataset?
Can you measure the size of the input dataset? (Or is it provided via an iterator?)
Does space utilization matter?
These can all effect your choice of data structure.
Java List:
If you don't have such requirement that you have to keep duplicate or not. Then you can use List instead of Set.
List is an interface in Collection framework. Which extends Collection interface. and ArrayList, LinkedList is the implementation of List interface.
When to use ArrayList or LinkedList
ArrayList: If you have such requirement that in your application mostly work is accessing the data. Then you should go for ArrayList. because ArrayList implements RtandomAccess interface which is Marker Interface. because of Marker interface ArrayList have capability to access the data in O(1) time. and you can use ArrayList over LinkedList where you want to get data according to insertion order.
LinkedList: If you have such requirement that your mostly work is insertion or deletion. Then you should use LinkedList over the ArrayList. because in LinkedList insertion and deletion happen in O(1) time whereas in ArrayList it's O(n) time.
Java Set:
If you have requirement in your application that you don't want any duplicates. Then you should go for Set instead of List. Because Set doesn't store any duplicates. Because Set works on the principle of Hashing. If we add object in Set then first it checks object's hashCode in the bucket if it's find any hashCode present in it's bucked then it'll not add that object.

confusing java data structures

Maybe the title is not appropriate but I couldn't think of any other at this moment. My question is what is the difference between LinkedList and ArrayList or HashMap and THashMap .
Is there a tree structure already for Java(ex:AVL,red-black) or balanced or not balanced(linked list). If this kind of question is not appropriate for SO please let me know I will delete it. thank you
ArrayList and LinkedList are implementations of the List abstraction. The first holds the elements of the list in an internal array which is automatically reallocated as necessary to make space for new elements. The second constructs a doubly linked list of holder cells, each of which refers to a list element. While the respective operations have identical semantics, they differ considerably in performance characteristics. For example:
The get(int) operation on an ArrayList takes constant time, but it takes time proportional to the length of the list for a LinkedList.
Removing an element via the Iterator.remove() takes constant time for a LinkedList, but it takes time proportional to the length of the list for an ArrayList.
The HashMap and THashMap are both implementations of the Map abstraction that are use hash tables. The difference is in the form of hash table data structure used in each case. The HashMap class uses closed addressing which means that each bucket in the table points to a separate linked list of elements. The THashMap class uses open addressing which means that elements that hash to the same bucket are stored in the table itself. The net result is that THashMap uses less memory and is faster than HashMap for most operations, but is much slower if you need the map's set of key/value pairs.
For more detail, read a good textbook on data structures. Failing that, look up the concepts in Wikipedia. Finally, take a look at the source code of the respective classes.
Read the API docs for the classes you have mentioned. The collections tutorial also explains the differences fairly well.
java.util.TreeMap is based on a red-black tree.
Regarding the lists:
Both comply with the List interface, but their implementation is different, and they differ in the efficiency of some of their operations.
ArrayList is a list stored internally as an array. It has the advantage of random access, but a single item addition is not guaranteed to run in constant time. Also, removal of items is inefficient.
A LinkedList is implemented as a doubly connected linked list. It does not support random access, but removing an item while iterating through it is efficient.
As I remember, both (LinkedList and ArrayList) are the lists. But they have defferent inner realization.

Categories

Resources