When we create a Collection (ArrayList,HashMap) in Java, does Java internally create some kind of index for faster retrieval of data ? In Oracle we have to manually create indexes but what is the technique (if any) used in Java
For ArrayList, each Object has a unique index (even duplicate objects).
An object can easily be accessed by its index using ArrayList.get(). The index is based on the order Objects are added (assuming you haven't sorted the ArrayList or otherwise changed the order). When an object is removed from an ArrayList, all elements in front of it (with a larger index) are shifted to the left so that their indices become index - 1.
A HashMap uses a slightly more complex indexing scheme...
For a HashMap, all indexing information is hidden from you first of all, so you don't really need to know this unless you want to understand its internal workings (which is a good thing!) It does use indexing however... HashMaps use an array of Entrys (its own implementation of Map.Entry) to store information. Entry represents a node in a linked list (not to be confused with the object java.util.LinkedList) and it stores a key, a value, and the next node in the linked list.
The index of an entry in a HashMap is simply h & (length - 1) where h is the hashCode of the key, passed through a custom hashing method internal to the java.util package (you won't be able to access it), and length is a power-of-two integer representing the size of the array of Entrys (this will automatically grow if need be).
Of course there may be some collisions if two keys end up computing the same hash. This is why HashMap uses an array of linked lists. In case of a collision, where two Entrys have the same hash, one can be tagged to the end of the other.
To get an object in HashMap, the index is calculated from the key you provide through get(key) and the relevant Entry is retrieved from the array of Entrys. Now that the map has the first node in a linked list, it will iterate over all elements of this linked list until it finds the key equal to the key you provided.
yes.. Java does hide these implementation details. But for the performance that java provides, there may be some indexing technique internally used. When you refer to "Oracle" I believe its the SQL or database software and not a language like Java.
Related
I read that HashMap has a backing array, where entries are stored (marked with bucket number, initial size 16). Arrays are ordered, and I can call get(n) to get the element at nth position. Then why is HashMap unordered and has no get(n) method?
It depends on your view of what ordered means.
Indeed HashMapss internally use an array or another collection that has a fixed ordering. However the order has nothing to do with insertion order or something like that. The elements are ordered, for example, in increasing size of their hash-values and they have nothing to do with some actual ordering on the elements themselves.
So HashMaps indeed have something like a get(n) method if you think of n being the hash-value of the key-element. The method is called get(*key*) and it first computes the hash-value of the given key-element and then looks the value up on the internal structure by using get(*hash-value*) on it.
Here is an image a quick search yield that shows the structure of HashSets:
Note that HashSets are kinda the same than HashMaps, they use the same technique and the same image applies. But instead of just inserting an element a map inserts a container that is identified by the key and additionally holds a value.
Just as a small overview. A hash-function is a function that given an object computes a small value, the hash-value out of it, using its properties. The computation usually can be done fast and a lookup on the internal array at the position given by the hash-value is thus also fast.
To your specific question, as an user of a HashMap you generally are not interested in what elements specifically hide behind hash-value 1 or 2 and so on, that is why they did not include such a method. However if you truly need to do that for a special application or so than you can always try to use Reflection to access the internals of your HashMap or you could also just write a small wrapper around the class that provides such a method.
A HashMap is divided into individual buckets. Buckets are initially backed by an array, however if the buckets get too large then they are converted to tree structures which are sorted based on hash codes. That fact alone destroys any guarantee it could make about preserving insertion order.
If you'd like to know more about how it's implemented, you can look at my answer to this question: HashMap Java 8 implementation
In this question How can I efficiently select a Standard Library container in C++11? is a handy flow chart to use when choosing C++ collections.
I thought that this was a useful resource for people who are not sure which collection they should be using so I tried to find a similar flow chart for Java and was not able to do so.
What resources and "cheat sheets" are available to help people choose the right Collection to use when programming in Java? How do people know what List, Set and Map implementations they should use?
Since I couldn't find a similar flowchart I decided to make one myself.
This flow chart does not try and cover things like synchronized access, thread safety etc or the legacy collections, but it does cover the 3 standard Sets, 3 standard Maps and 2 standard Lists.
This image was created for this answer and is licensed under a Creative Commons Attribution 4.0 International License. The simplest attribution is by linking to either this question or this answer.
Other resources
Probably the most useful other reference is the following page from the oracle documentation which describes each Collection.
HashSet vs TreeSet
There is a detailed discussion of when to use HashSet or TreeSet here:
Hashset vs Treeset
ArrayList vs LinkedList
Detailed discussion: When to use LinkedList over ArrayList?
Summary of the major non-concurrent, non-synchronized collections
Collection: An interface representing an unordered "bag" of items, called "elements". The "next" element is undefined (random).
Set: An interface representing a Collection with no duplicates.
HashSet: A Set backed by a Hashtable. Fastest and smallest memory usage, when ordering is unimportant.
LinkedHashSet: A HashSet with the addition of a linked list to associate elements in insertion order. The "next" element is the next-most-recently inserted element.
TreeSet: A Set where elements are ordered by a Comparator (typically natural ordering). Slowest and largest memory usage, but necessary for comparator-based ordering.
EnumSet: An extremely fast and efficient Set customized for a single enum type.
List: An interface representing a Collection whose elements are ordered and each have a numeric index representing its position, where zero is the first element, and (length - 1) is the last.
ArrayList: A List backed by an array, where the array has a length (called "capacity") that is at least as large as the number of elements (the list's "size"). When size exceeds capacity (when the (capacity + 1)-th element is added), the array is recreated with a new capacity of (new length * 1.5)--this recreation is fast, since it uses System.arrayCopy(). Deleting and inserting/adding elements requires all neighboring elements (to the right) be shifted into or out of that space. Accessing any element is fast, as it only requires the calculation (element-zero-address + desired-index * element-size) to find it's location. In most situations, an ArrayList is preferred over a LinkedList.
LinkedList: A List backed by a set of objects, each linked to its "previous" and "next" neighbors. A LinkedList is also a Queue and Deque. Accessing elements is done starting at the first or last element, and traversing until the desired index is reached. Insertion and deletion, once the desired index is reached via traversal is a trivial matter of re-mapping only the immediate-neighbor links to point to the new element or bypass the now-deleted element.
Map: An interface representing an Collection where each element has an identifying "key"--each element is a key-value pair.
HashMap: A Map where keys are unordered, and backed by a Hashtable.
LinkedhashMap: Keys are ordered by insertion order.
TreeMap: A Map where keys are ordered by a Comparator (typically natural ordering).
Queue: An interface that represents a Collection where elements are, typically, added to one end, and removed from the other (FIFO: first-in, first-out).
Stack: An interface that represents a Collection where elements are, typically, both added (pushed) and removed (popped) from the same end (LIFO: last-in, first-out).
Deque: Short for "double ended queue", usually pronounced "deck". A linked list that is typically only added to and read from either end (not the middle).
Basic collection diagrams:
Comparing the insertion of an element with an ArrayList and LinkedList:
Even simpler picture is here. Intentionally simplified!
Collection is anything holding data called "elements" (of the same type). Nothing more specific is assumed.
List is an indexed collection of data where each element has an index. Something like the array, but more flexible.
Data in the list keep the order of insertion.
Typical operation: get the n-th element.
Set is a bag of elements, each elements just once (the elements are distinguished using their equals() method.
Data in the set are stored mostly just to know what data are there.
Typical operation: tell if an element is present in the list.
Map is something like the List, but instead of accessing the elements by their integer index, you access them by their key, which is any object. Like the array in PHP :)
Data in Map are searchable by their key.
Typical operation: get an element by its ID (where ID is of any type, not only int as in case of List).
The differences
Set vs. Map: in Set you search data by themselves, whilst in Map by their key.
N.B. The standard library Sets are indeed implemented exactly like this: a map where the keys are the Set elements themselves, and with a dummy value.
List vs. Map: in List you access elements by their int index (position in List), whilst in Map by their key which os of any type (typically: ID)
List vs. Set: in List the elements are bound by their position and can be duplicate, whilst in Set the elements are just "present" (or not present) and are unique (in the meaning of equals(), or compareTo() for SortedSet)
It is simple: if you need to store values with keys mapped to them go for the Map interface, otherwise use List for values which may be duplicated and finally use the Set interface if you don’t want duplicated values in your collection.
Here is the complete explanation http://javatutorial.net/choose-the-right-java-collection , including flowchart etc
Map
If choosing a Map, I made this table summarizing the features of each of the ten implementations bundled with Java 11.
Common collections, Common collections
My question is about what are the fundamental/concrete data structure (like array) used in implementing abstract data structure implementations like variations maps/trees?
I'm looking for what's used really in java collection, not theoretical answers.
Based on quick code review of Sun/Oracle JDK. You can easily find the details yourself.
Lists/queues
ArrayList
Growing Object[] elementData field. Can hold 10 elements by default, grows by around 50% when cannot hold more objects, copying the old array to a bigger new one. Does not shrink when removing items.
LinkedList
Reference to Entry which in turns hold reference to actual element, previous and next element (if any).
ArrayDeque
Similar to ArrayList but also holding two pointers to internal E[] elements array - head and tail. Both adding and removing elements on either side is just a matter of moving these pointers. The array grows by 200% when is too small.
Maps
HashMap
Growing Entry[] table field holding so called buckets. Each bucket contains linked list of entries having the same hash of the key module table size.
TreeMap
Entry<K,V> root reference holding the root of the red-black balanced tree.
ConcurrentHashMap
Similar to HashMap but access to each bucket (called segment) is synchronized by an independent lock.
Sets
TreeSet
Uses TreeMap underneath (!)
HashSet
Uses HashMap underneath (!)
BitSet
Uses long[] words field to be as memory efficient as possible. Up to 64 bits can be stored in one element.
There is of course one answer for each implementation. Look at the javadocs, they often describe these things. http://docs.oracle.com/javase/7/docs/api/
In java, List and Map are using in collections. But i couldn't understand at which situations we should use List and which time use Map. What is the major difference between both of them?
Now would be a good time to read the Java collections tutorial - but fundamentally, a list is an ordered sequence of elements which you can access by index, and a map is a usually unordered mapping from keys to values. (Some maps preserve insertion order, but that's implementation-specific.)
It's usually fairly obvious when you want a key/value mapping and when you just want a collection of elements. It becomes less clear if the key is part of the value, but you want to be able to get at an item by that key efficiently. That's still a good use case for a map, even though in some senses you don't have a separate collection of keys.
There's also Set, which is a (usually unordered) collection of distinct elements.
Map is for Key:Value pair kind of data.for instance if you want to map student roll numbers to their names.
List is for simple ordered collection of elements which allow duplicates.
for instance to represent list of student names.
Map Interface
A Map cares about unique identifiers. You map a unique key (the ID) to a specific
value, where both the key and the value are, of course, objects.
The Map implementations let you do things like search for a
value based on the key, ask for a collection of just the values, or ask for a collection
of just the keys. Like Sets, Maps rely on the equals() method to determine whether
two keys are the same or different.
List Interface
A List cares about the index. The one thing that List has that non-lists don't have
is a set of methods related to the index. Those key methods include things like
get(int index), indexOf(Object o), add(int index, Object obj), and so
on. All three List implementations are ordered by index position—a position that
you determine either by setting an object at a specific index or by adding it without
specifying position, in which case the object is added to the end.
list is a linked list, where every object is connected to the next one via pointers. the time it takes to insert a new object to the list is O(1) but the rest of operations on it take longer.
the good thing about it is that it takes exactly the amount of memory you need and not even on byte more than that.
Maps are a data structure that has an array and each entry to the array is calculated with a hashFunction(key) that calculates the location according to the key. almost every operation in a Map taks O(1) (except inserting when there are 2 identical keys) but the space complexity is fairly large.
for more reading try wikipedia's HashMap and linked list
HashList is a data structure storing objects in a hash table and a list.it is a combination of hashmap and doubly linked list. acess will be faster. HashMap is hash table implementation of map interface it is same as HashTable except that it is unsynchronized and allow null values. List is an ordered collection and it allow nulls and duplicates in it. positional acess is possible. Set is a collection that doesn't allow duplicates, it may allow at most one null element. same as our mathematical set.
List is just an ordered collectiom(a sequence). Check this list documentation .You can access elements by their integer index (position in the list), and search for elements in the list.
Also lists allow duplicate elements and multiple NULL elements.
Map is an object that maps the values to the keys. Check this map documentation. A map cannot contain duplicate keys; each key can map to at most one value.
List - This datastructure is used to contain list of elements.
In case you need list of elements and the list may contain duplicate values,
then you have to use List.
Map - It contains data as key value pair. When you have to store data
in key value pair,so that latter you can retrieve data using the key,
you have to use Map data structure.
List implementation - ArrayList, LinkedList
Map implementation - HashMap, TreeMap
In comparison HashMap to ArrayList -
A hash map is the fastest data structure if you want to get all nodes for a page. The list of nodes can be fetched in constant time (O(1)) while with lists the time is O(n) (n=number of pages, faster on sorted lists but never getting near O(1))
What is the reason why we cannot always use a HashMap, even though it is much more efficient than ArrayList or LinkedList in add,remove operations, also irrespective of the number of the elements.
I googled it and found some reasons, but there was always a workaround for using HashMap, with advantages still alive.
Lists represent a sequential ordering of elements.
Maps are used to represent a collection of key / value pairs.
While you could use a map as a list, there are some definite downsides of doing so.
Maintaining order:
A list by definition is ordered. You add items and then you are able to iterate back through the list in the order that you inserted the items. When you add items to a HashMap, you are not guaranteed to retrieve the items in the same order you put them in. There are subclasses of HashMap like LinkedHashMap that will maintain the order, but in general order is not guaranteed with a Map.
Key/Value semantics:
The purpose of a map is to store items based on a key that can be used to retrieve the item at a later point. Similar functionality can only be achieved with a list in the limited case where the key happens to be the position in the list.
Code readability
Consider the following examples.
// Adding to a List
list.add(myObject); // adds to the end of the list
map.put(myKey, myObject); // sure, you can do this, but what is myKey?
map.put("1", myObject); // you could use the position as a key but why?
// Iterating through the items
for (Object o : myList) // nice and easy
for (Object o : myMap.values()) // more code and the order is not guaranteed
Collection functionality
Some great utility functions are available for lists via the Collections class. For example ...
// Randomize the list
Collections.shuffle(myList);
// Sort the list
Collections.sort(myList, myComparator);
Lists and Maps are different data structures. Maps are used for when you want to associate a key with a value and Lists are an ordered collection.
Map is an interface in the Java Collection Framework and a HashMap is one implementation of the Map interface. HashMap are efficient for locating a value based on a key and inserting and deleting values based on a key. The entries of a HashMap are not ordered.
ArrayList and LinkedList are an implementation of the List interface. LinkedList provides sequential access and is generally more efficient at inserting and deleting elements in the list, however, it is it less efficient at accessing elements in a list. ArrayList provides random access and is more efficient at accessing elements but is generally slower at inserting and deleting elements.
I will put here some real case examples and scenarios when to use one or another, it might be of help for somebody else:
HashMap
When you have to use cache in your application. Redis and membase are some type of extended HashMap. (Doesn't matter the order of the elements, you need quick ( O(1) ) read access (a value), using a key).
LinkedList
When the order is important (they are ordered as they were added to the LinkedList), the number of elements are unknown (don't waste memory allocation) and you require quick insertion time ( O(1) ). A list of to-do items that can be listed sequentially as they are added is a good example.
The downfall of ArrayList and LinkedList is that when iterating through them, depending on the search algorithm, the time it takes to find an item grows with the size of the list.
The beauty of hashing is that although you sacrifice some extra time searching for the element, the time taken does not grow with the size of the map. This is because the HashMap finds information by converting the element you are searching for, directly into the index, so it can make the jump.
Long story short...
LinkedList: Consumes a little more memory than ArrayList, low cost for insertions(add & remove)
ArrayList: Consumes low memory, but similar to LinkedList, and takes extra time to search when large.
HashMap: Can perform a jump to the value, making the search time constant for large maps. Consumes more memory and takes longer to find the value than small lists.