Why Linkedlist in hashmap?Why not other implementation of List? - java

As HashMap uses LinkedList when two different keys produces a same hashCode.But I was wondering what makes LinkedList a better candidate here over other implementation of List.Why not ArrayList because ArrayList uses Array internally and arrays have a faster iteration compared to a LinkedList.

Collisions in hash maps are an exception, rather than a rule. When your hash function is reasonably good, as it should be, there should be very few collisions.
If we used ArrayList for the buckets, with most lists being empty or having exactly one element, this would be a rather big waste of resources. With array lists allocating multiple members upfront, you would end up paying forward for multiple collisions that you may not have in the future.
Moreover, removing from array lists is cheap only when the last element gets deleted. When the first one gets deleted, you end up paying for the move of all elements.
Linked lists are free from these problems. Insertion is O(1), deletion is O(1), and they use exactly as many nodes as you insert. The memory overhead of the next/prior links is not too big a price to pay for this convenience.

The problem with an arrayList is that you can't fast remove an element: you have to move all the elements after the one you remove.
With a linkedList, removing an element is merely changing a reference from one node to the new next one, skipping the removed one.
The difference is huge. When you want to have a list and be able to fast remove elements, don't use an arraylist, the usual choice is the linked list.

Why not ArrayList because ArrayList uses Array internally and arrays have a faster iteration compared to a LinkedList.
And ArrayList is much slower to modify. So they made a judgement call and went with LinkedList.

Related

Benefits of linked list over array [duplicate]

This question already has answers here:
When to use LinkedList over ArrayList in Java?
(33 answers)
Closed 10 months ago.
I really don't see how linked lists are better than array , the insertion and deletion complexities are same , eg. , In array the insertion at rear is O(1) while for linked lists the insertion at head is O(1) , and simillarly insertion in arrays at front is O(n) but for the later it is O(n) to insert at the rear end.
Apart from the only fact that linked lists are dynamic in nature ,I dont see any benefits of linked lists over arrays. Moreover , I can use a dynamic array to counter that problem.
Again Array also have better results when we want to access an element.
So can anybody please tell me why are linked lists better than array? And if they are not better , then why do we use it?
No data structure is universally better than another data structure. There are benefits and drawbacks and which is better depends on what benefits and drawbacks are more important for your use case.
the insertion and deletion complexities are same
Firstly, it isn't possible to insert or delete elements of arrays at all. Their size remains constant. But I'll assume that you meant "dynamic array" data structure i.e. std::vector and java.util.ArrayList (not to be confused with dynamically allocated array which is also called "dynamic array").
The insertion and deletion complexities between linked lists and dynamic array are not the same.
but for the later it is O(n) to insert at the rear end.
Given a an iterator to a linked linked list, you can insert after the pointed element in constant time. Hence, if you maintain an iterator to the end of the list, you can insert there in constant time.
An important advantage of linked list over the dynamic array, besides the complexity, is that the elements remain stable in memory. In vector, if adding an element exceeds the capacity, then all iterators and references to existing elements are invalidated. Iterators as well as references to linked list elements remain valid until the element is erased. If you need this property, then using a dynamic array is not an option.
Linked lists, as independent data structures, are rarely used.
Quite often, however, objects are linked into a list by incorporating next and maybe prev pointers directly into the objects themselves. As a data structure, that is often referred to as an "intrusive" linked list.
Sometimes this is used to add features to existing data structures. In Java you can see this in LinkedHashMap, which links the map entries together into a list, preserving their insertion or access order and allowing you to use it as an LRU cache. Similarly, leaf nodes in a B+tree are often linked onto a list to simplify traversal. There are many examples.

Why don't we count linear search cost as a prerequisite bottleneck for the insertion operation of a linked list, compared to ArrayList?

I have had this question for a while but I have been unsatisfied with the answers because the distinctions appear to be arbitrary and more like conventional wisdom that is sort of blindly accepted rather than assessed critically.
In an ArrayList it is said that insertion cost (for a single element) is linear. If we are inserting at index p for 0 <= p < n where n is the size of the list, then the remaining n-p elements are shifted over first before the new element is copied into position p.
In a LinkedList, it is said that insertion cost (for a single element) is constant. For instance if we already have a node and we want to insert after it, we rearrange some pointers and it's done quickly. But getting this node in the first place, I don't see how it can be done other than a linear search first (assuming it isn't a trivial case like prepending at the start of the list or appending at the end).
And yet in the case of the LinkedList, we don't count that initial search time. To me this is confusing because it's sort of like saying "The ice cream is free... after you pay for it." It's like, well, of course it is... but that sort of skips the hard part of paying for it. Of course inserting in a LinkedList is going to be constant time if you already have the node you want, but getting that node in the first place may take some extra time! I could easily say that inserting in an ArrayList is constant time... after I move the remaining n-p elements.
So I don't understand why this distinction is made for one but not the other. You could argue that insertion is considered constant for LinkedLists because of the cases where you insert at the front or back where linear time operations are not required, whereas in an ArrayList, insertion requires copying of the suffix array after position p, but I could easily counter that by saying if we insert at the back of an ArrayList, it is amortized constant time and doesn't require extra copying in most cases unless we reach capacity.
In other words we separate the linear stuff from the constant stuff for LinkedList, but we don't separate them for the ArrayList, even though in both cases, the linear operations may not be invoked or not invoked.
So why do we consider them separate for LinkedList and not for ArrayList? Or are they only being defined here in the context where LinkedList is overwhelmingly used for head/tail appends and prepends as opposed to elements in the middle?
This is basically a limitation of the Java interface for List and LinkedList, rather than a fundamental limitation of linked lists. That is, in Java there is no convenient concept of "a pointer to a list node".
Every type of list has a few different concepts loosely associated with the idea of pointing to a particular item:
The idea of a "reference" to a specific item in a list
The integer position of an item in the list
The value of a item that may be in the list (possibly multiple times)
The most general concept is the first one, and is usually encapsulated in the idea of an iterator. As it happens, the simple way to implement an iterator for an array backed list is simply to wrap an integer which refers to the position of the item in a list. So for array lists only, the first and second ways of referring to items are pretty tightly bound.
For other list types, however, and even for most other container types (trees, hashes, etc) that is not the case. The generic reference to an item is usually something like a pointer to the wrapper structure around one item (e.g., HashMap.Entry or LinkedList.Entry). For these structures the idea of accessing the nth element isn't necessary natural or even possible (e.g., unordered collections like sets and many hash maps).
Perhaps unfortunately, Java made the idea of getting an item by its index a first-class operation. Many of the operations directly on List objects are implemented in terms of list indexes: remove(int index), add(int index, ...), get(int index), etc. So it's kind of natural to think of those operations as being the fundamental ones.
For LinkedList though it's more fundamental to use a pointer to a node to refer to an object. Rather than passing around a list index, you'd pass around the pointer. After inserting an element, you'd get a pointer to the element.
In C++ this concept is embodied in the concept of the iterator, which is the first class way to refer to items in collections, including lists. So does such a "pointer" exist in Java? It sure does - it's the Iterator object! Usually you think of an Iterator as being for iteration, but you can also think of it as pointing to a particular object.
So the key observation is: given an pointer (iterator) to an object, you can remove and add from linked lists in constant time, but from an array-like list this takes linear time in general. There is no inherent need to search for an object before deleting it: there are plenty of scenarios where you can maintain or take as input such a reference, or where you are processing the entire list, and here the constant time deletion of linked lists does change the algorithmic complexity.
Of course, if you need to do something like delete the first entry containing the value "foo" that implies both a search and a delete operation. Both array-based and linked lists taken O(n) for search, so they don't vary here - but you can meaningfully separate the search and delete operations.
So you could, in principle, pass around Iterator objects rather than list indexes or object values - at least if your use case supports it. However, at the top I said that "Java has no convenient notion of a pointer to a list node". Why?
Well because actually using Iterator is actually very inconvenient. First of all, it's tough to get an Iterator to an object in the first place: for example, and unlike C++, the add() methods don't return an Iterator - so to get a pointer to the item you just added, you need to go ahead and iterate over the list or use the listIterator(int index) call, which is inherently inefficient for linked lists. Many methods (e.g., subList()) support only a version that takes indexes, but not Iterators - even when such a method could be efficiently supported.
Add to that the restrictions around iterator invalidation when the list is modified, and they actually become pretty useless for referring to elements except in immutable lists.
So Java's support of pointers to list elements is pretty half-hearted an so it's tough to leverage the constant time operations that linked list offers, except in cases such as adding to the front of a list, or deleting items during iteration.
It's not limited to lists, either - the ConcurrentQueue is also a linked structure which supports constant time deletes, but you can't reliably use that ability from Java.
If you're using a LinkedList, chances are you're not going to use it for a random access insert. LinkedList offers constant time for push (insert at the beginning) or add (because it has a ref to the final element IIRC). You are correct in your suspicion that an insert into a random index (e.g. insert sorted) will take linear time - not constant.
ArrayList, by contrast, is worst case linear. Most of the time it simply does an arraycopy to shift the indices (which is a low-level shift that is constant time). Only when you need to resize the backing array will it take linear time.

Why does Hashmap Internally use LinkedList instead of Arraylist

Why does Hashmap internally use a LinkedList instead of an Arraylist when two objects are placed in the same bucket in the hash table?
Why does HashMap internally use s LinkedList instead of an Arraylist, when two objects are placed into the same bucket in the hash table?
Actually, it doesn't use either (!).
It actually uses a singly linked list implemented by chaining the hash table entries. (By contrast, a LinkedList is doubly linked, and it requires a separate Node object for each element in the list.)
So why am I nitpicking here? Because it is actually important ... because it means that the normal trade-off between LinkedList and ArrayList does not apply.
The normal trade-off is:
ArrayList uses less space, but insertion and removal of a selected element is O(N) in the worst case.
LinkedList uses more space, but insertion and removal of a selected element1 is O(1).
However, in the case of the private singly linked list formed by chaining together HashMap entry nodes, the space overhead is one reference (same as ArrayList), the cost of inserting a node is O(1) (same as LinkedList), and the cost of removing a selected node is also O(1) (same as LinkedList).
Relying solely on "big O" for this analysis is dubious, but when you look at the actual code, it is clear that what HashMap does beat ArrayList on performance for deletion and insertion, and is comparable for lookup. (This ignores memory locality effects.) And it also uses less memory for the chaining than either ArrayList or LinkedList was used ... considering that there are already internal entry objects to hold the key / value pairs.
But it gets even more complicated. In Java 8, they overhauled the HashMap internal data structures. In the current implementation, once a hash chain exceeds a certain length threshold, the implementation switches to using a binary tree representation if the key type implements Comparable.
1 - That is the insertion / deletion is O(1) if you have found the insertion / removal point. For example, if you are using the insert and remove methods on a LinkedList object's ListIterator.
This basically boils down to complexities of ArrayList and LinkedList.
Insertion in LinkedList (when order is not important) is O(1), just append to start.
Insertion in ArrayList (when order is not important) is O(N) ,traverse to end and there is also resizing overhead.
Removal is O(n) in LinkedList, traverse and adjust pointers.
Removal in arraylist could be O(n^2) , traverse to element and shift elements or resize the Arraylist.
Contains will be O(n) in either cases.
When using a HashMap we will expect O(1) operations for add, remove and contains. Using ArrayList we will incur higher cost for the add, remove operations in buckets
Short Answer : Java uses either LinkedList or ArrayList (whichever it finds appropriate for the data).
Long Answer
Although sorted ArrayList looks like the obvious way to go, there are some practical benefits of using LinkedList instead.
We need to remember that LinkedList chain is used only when there is collision of keys.
But as a definition of Hash function : Collisions should be rare
In rare cases of collisions we have to choose between Sorted ArrayList or LinkedList.
If we compare sorted ArrayList and LinkedList there are some clear trade-offs
Insertion and Deletion : Sorted ArrayList takes O(n), but LinkedList takes constant O(1)
Retrieval : Sorted ArrayList takes O(logn) and LinkedList takes 0(n).
Now, its clear that LinkedList are better than sorted ArrayList during insertion and deletion, but they are bad while retrieval.
In there are fewer collisions, sorted ArrayList brings less value (but more over head).
But when the collisions are more frequent and the collided elements list become large(over certain threshold) Java changes the collision data structure from LinkedList to ArrayList.

Java collection insertion: Set vs. List

I'm thinking about filling a collection with a large amount of unique objects.
How is the cost of an insert in a Set (say HashSet) compared to an List (say ArrayList)?
My feeling is that duplicate elimination in sets might cause a slight overhead.
There is no "duplicate elimination" such as comparing to all existing elements. If you insert into hash set, it's really a dictionary of items by hash code. There's no duplicate checking unless there already are items with the same hash code. Given a reasonable (well-distributed) hash function, it's not that bad.
As Will has noted, because of the dictionary structure HashSet is probably a bit slower than an ArrayList (unless you want to insert "between" existing elements). It also is a bit larger. I'm not sure that's a significant difference though.
You're right: set structures are inherently more complex in order to recognize and eliminate duplicates. Whether this overhead is significant for your case should be tested with a benchmark.
Another factor is memory usage. If your objects are very small, the memory overhead introduced by the set structure can be significant. In the most extreme case (TreeSet<Integer> vs. ArrayList<Integer>) the set structure can require more than 10 times as much memory.
If you're certain your data will be unique, use a List. You can use a Set to enforce this rule.
Sets are faster than Lists if you have a large data set, while the inverse is true for smaller data sets. I haven't personally tested this claim.
Which type of List?
Also, consider which List to use. LinkedLists are faster at adding, removing elements.
ArrayLists are faster at random access (for loops, etc), but this can be worked around using the Iterator of a LinkedList. ArrayLists are are much faster at: list.toArray().
You have to compare concrete implementations (for example HashSet with ArrayList), because the abstract interfaces Set/List don't really tell you anything about performance.
Inserting into a HashSet is a pretty cheap operation, as long as the hashCode() of the object to be inserted is sane. It will still be slightly slower than ArrayList, because it's insertion is a simple insertion into an array (assuming you insert in the end and there's still free space; I don't factor in resizing the internal array, because the same cost applies to HashSet as well).
If the goal is the uniqueness of the elements, you should use an implementation of the java.util.Set interface. The class java.util.HashSet and java.util.LinkedHashSet have O(alpha) (close to O(1) in the best case) complexity for insert, delete and contains check.
ArrayList have O(n) for object (not index) contains check (you have to scroll through the whole list) and insertion (if the insertion is not in tail of the list, you have to shift the whole underline array).
You can use LinkedHashSet that preserve the order of insertion and have the same potentiality of HashSet (takes up only a bit more of memory).
I don't think you can make this judgement simply on the cost of building the collection. Other things that you need to take into account are:
Is the input dataset ordered? Is there a requirement that the output data structure preserves insertion order?
Is there a requirement that the output data structure is ordered (or reordered) based on element values?
Will the output data structure be subsequently modified? How?
Is there a requirement that the output data structure is duplicate free if other elements are added subsequently?
Do you know how many elements are likely to be in the input dataset?
Can you measure the size of the input dataset? (Or is it provided via an iterator?)
Does space utilization matter?
These can all effect your choice of data structure.
Java List:
If you don't have such requirement that you have to keep duplicate or not. Then you can use List instead of Set.
List is an interface in Collection framework. Which extends Collection interface. and ArrayList, LinkedList is the implementation of List interface.
When to use ArrayList or LinkedList
ArrayList: If you have such requirement that in your application mostly work is accessing the data. Then you should go for ArrayList. because ArrayList implements RtandomAccess interface which is Marker Interface. because of Marker interface ArrayList have capability to access the data in O(1) time. and you can use ArrayList over LinkedList where you want to get data according to insertion order.
LinkedList: If you have such requirement that your mostly work is insertion or deletion. Then you should use LinkedList over the ArrayList. because in LinkedList insertion and deletion happen in O(1) time whereas in ArrayList it's O(n) time.
Java Set:
If you have requirement in your application that you don't want any duplicates. Then you should go for Set instead of List. Because Set doesn't store any duplicates. Because Set works on the principle of Hashing. If we add object in Set then first it checks object's hashCode in the bucket if it's find any hashCode present in it's bucked then it'll not add that object.

java linkedlist slower than arraylist when adding elements?

i thought linkedlists were supposed to be faster than an arraylist when adding elements? i just did a test of how long it takes to add, sort, and search for elements (arraylist vs linkedlist vs hashset). i was just using the java.util classes for arraylist and linkedlist...using both of the add(object) methods available to each class.
arraylist out performed linkedlist in filling the list...and in a linear search of the list.
is this right? did i do something wrong in the implementation maybe?
***************EDIT*****************
i just want to make sure i'm using these things right. here's what i'm doing:
public class LinkedListTest {
private List<String> Names;
public LinkedListTest(){
Names = new LinkedList<String>();
}
Then I just using linkedlist methods ie "Names.add(strings)". And when I tested arraylists, it's nearly identical:
public class ArrayListTest {
private List<String> Names;
public ArrayListTest(){
Names = new ArrayList<String>();
}
Am I doing it right?
Yes, that's right. LinkedList will have to do a memory allocation on each insertion, while ArrayList is permitted to do fewer of them, giving it amortized O(1) insertion. Memory allocation looks cheap, but may be actually be very expensive.
The linear search time is likely slower in LinkedList due to locality of reference: the ArrayList elements are closer together, so there are fewer cache misses.
When you plan to insert only at the end of a List, ArrayList is the implementation of choice.
Remember that:
there's a difference in "raw" performance for a given number of elements, and in how different structures scale;
different structures perform differently at different operations, and that's essentially part of what you need to take into account in choosing which structure to use.
So, for example, a linked list has more to do in adding to the end, because it has an additional object to allocate and initialise per item added, but whatever that "intrinsic" cost per item, both structures will have O(1) performance for adding to the end of the list, i.e. have an effectively "constant" time per addition whatever the size of the list, but that constant will be different between ArrayList vs LinkedList and likely to be greater for the latter.
On the other hand, a linked list has constant time for adding to the beginning of the list, whereas in the case of an ArrayList, the elements must be "shuftied" along, an operation that takes some time proportional to the number of elements. But, for a given list size, say, 100 elements, it may still be quicker to "shufty" 100 elements than it is to allocate and initialise a single placeholder object of the linked list (but by the time you get to, say, a thousand or a million objects or whatever the threshold is, it won't be).
So in your testing, you probably want to consider both the "raw" time of the operations at a given size and how these operations scale as the list size grows.
Why did you think LinkedList would be faster? In the general case, an insert into an array list is simply a case of updating the pointer for a single array cell (with O(1) random access). The LinkedList insert is also random access, but must allocate an "cell" object to hold the entry, and update a pair of pointers, as well as ultimately setting the reference to the object being inserted.
Of course, periodically the ArrayList's backing array may need to be resized (which won't be the case if it was chosen with a large enough initial capacity), but since the array grows exponentially the amortized cost will be low, and is bounded by O(lg n) complexity.
Simply put - inserts into array lists are much simpler and therefore much faster overall.
Linked list may be slower than array list in these cases for a few reasons. If you are inserting into the end of the list, it is likely that the array list has this space already allocated. The underlying array is usually increased in large chunks, because this is a very time-consuming process. So, in most cases, to add an element in the back requires only sticking in a reference, whereas the linked list needs the creation of a node. Adding in the front and the middle should give different performance in for both types of list.
Linear traversal of the list will always be faster in an array based list because it must only traverse the array normally. This requires one dereferencing operation per cell. In the linked list, the nodes of the list must also be dereferenced, taking double the amount of time.
When adding an element to the back of a LinkedList (in Java LinkedList is actually a doubly linked list) it is an O(1) operation as is adding an element to the front of it. Adding an element on the ith position is roughly an O(i) operation.
So, if you were adding to the front of the list, a LinkedList would be significantly faster.
ArrayList is faster in accessing random index data, but slower when inserting elements in the middle of the list, because using linked list you just have to change reference values. But in an array list you have to copy all elements after the inserted index, one index behind.
EDIT: Is not there a linkedlist implementation which keeps the last element in mind? Doing it this way would speed up inserting at the end using linked list.

Categories

Resources