How to efficiently remove an element from java LinkedList

How to efficiently remove an element from java LinkedList - java

I have an algorithm where I pass through nodes in a graph in a certain way, occasionally passing through the same node several times, and I need to form a list of the nodes passed, such that a node appears once for the last time I passed it.
For instance, if I passed through nodes A -> B -> C -> A -> C, the list I need in the end is [B, A, C].
What I wanted to do was to use a LinkedList, such that every node in the graph will contain a reference to its node in the LinkedList. Then, every time I pass through a node, I will remove its corresponding node from the LinkedList and insert it again into the end of the LinkedList, and the complexity of the operation will only be O(1).
However, when I began implementing this, I ran into a problem: apparently, the java class LinkedList does not allow me to see its actual list nodes. Using the regular remove functions of LinkedList to remove the list node containing a given graph node will be O(n) instead O(1), negating the whole point of using a LinkedList to begin with.
Naturally, I can implement LinkedList myself, but I would rather avoid that - it seems to me that if I have to implement LinkedList in java, I'm doing something wrong.
So, is there a way to solve this problem without implementing LinkedList on my own? Is there something that I'm missing?

As it seems, you are expecting a built-in approach, i don't think there is any Collection which provides such functionality. You will have to implement it on your own as #MartijinCourteaux suggested. Or:
use Sorted Set collection like TreeSet<E> with supporting cost of O(log n) for operations: add, remove and contains.
LinkedHashSet<E> But beware unlike HashSet<E>, LinkedHashSet can have O(1) expected performance for operations: add, contains, remove but the performance is likely to be just slightly below that of HashSet, due to the added expense of maintaining the linked list. But we can use it without incurring the increased cost associated with TreeSet. However, insertion order is not affected if an element is re-inserted into the set so try removing the first insertion of an element before re-inserting it.

LinkedHashMap keeps order of entered values and allows remove node by its key and then put back to the end. I think that it is all you need.

Unless your linked list is large just using a regular array list will give fast performance even with the shuffling. You should also consider using hash sets, if order is not important, linked hash set if the order of insert matters, or tree set if you want it sorted. They don't allow duplicate values but have good O performance for insert, delete and contains.

Related

Fast LinkedList search and delete in java

I am using Java's Linkedlist in my project. I have to build a delete function that removes an element with a specified unique id (id is a filed in my class) in the Linkedlist. As per the Java official document, were I to use LinkedList.remove, the runtime would be O(n) as the process happens in two steps, the first of which is a linear search with a runtime of O(n) followed by the actual delete which takes O(1).
In an attempt to speed things up, I wanted to use a binary tree for lookup, where each node in the tree is (id, reference to the node in the linkedlist). I am not exactly sure how to implement this in Java. In C/C++, one could just store a pointer as reference to the node in the linkedlist.
==
If you are wondering why I have to use LinkedList, it's because I am building an order-matching engine for exchanges. LinkedList offers superior runtime as far as insert is concerned. I am also using insertion sort to keep prices in the orderbook sorted. Priority queue does not suit my needs because I have to show the sorted order book in real time.

Have you seen the video of Stroustrup's conference talk where he showed that you should use std::vector unless you have measured a performance benefit of not using std::vector? He showed that std::vector is almost always the correct collection to use, and showed that it is faster than linked list even when inserting and deleting in the middle.
Now translate that to Java: use ArrayList unless you have measured better performance with something else.
Why is that? With modern processor architectures, there is a locality benefit: elements that you compare together, elements that you process together, are all stored next to each other in memory and are likely to be in the CPU's cache at the same time. This allows them to be fetched and written to much faster than when they're in main memory. This is not the case with a linked list, where elements are allocated individually and spread all over the place. (This locality benefit is much more pronounced in C++ where you have the actual objects next to each other, but it's still valid to a smaller extent in Java, where you have the references next to each other, albeit not the actual objects.)
Now with ArrayList, you can keep the orders sorted by price, and use binary search to insert an order in the right place.
If your performance measurement shows that LinkedList is preferable, then unfortunately Java doesn't give you access to the internal representation – the actual nodes – of the LinkedList, so you'll have to homebrew your own list.

Why are you using a List?
If you have a unique id for each object, why not put it in a Map with the id as the key? If you choose a HashMap is implementation removal is O(1). If you implement using LinkedHashMap you can preserve insertion order as well.
LinkedList insertion is superior to....what?
HashMap get/put complexity

You can easily solve this by having a small change.
First have an object that has your value and id as fields
class MyElement implements Comparable{
int id,value;
//Implement compareTo() to sort based on values
//Override equals() method to compare ids
//Override hashcode() to return the id
}
Now use a TreeSet to store these objects.
In this data structure the incoming objects get sorted and deletion and insertion also find lower time complexity of O(log n)

To preserve order by id and to get good performance use TreeMap. Put, remove and get operations will be O(log n).
EDIT:
For preserving order of insertion of elements for each id you can use TreeMap < Integer, ArrayList < T > >, i.e. for each id you can save elements with particular id in list in order of insertion.

Why don't we count linear search cost as a prerequisite bottleneck for the insertion operation of a linked list, compared to ArrayList?

I have had this question for a while but I have been unsatisfied with the answers because the distinctions appear to be arbitrary and more like conventional wisdom that is sort of blindly accepted rather than assessed critically.
In an ArrayList it is said that insertion cost (for a single element) is linear. If we are inserting at index p for 0 <= p < n where n is the size of the list, then the remaining n-p elements are shifted over first before the new element is copied into position p.
In a LinkedList, it is said that insertion cost (for a single element) is constant. For instance if we already have a node and we want to insert after it, we rearrange some pointers and it's done quickly. But getting this node in the first place, I don't see how it can be done other than a linear search first (assuming it isn't a trivial case like prepending at the start of the list or appending at the end).
And yet in the case of the LinkedList, we don't count that initial search time. To me this is confusing because it's sort of like saying "The ice cream is free... after you pay for it." It's like, well, of course it is... but that sort of skips the hard part of paying for it. Of course inserting in a LinkedList is going to be constant time if you already have the node you want, but getting that node in the first place may take some extra time! I could easily say that inserting in an ArrayList is constant time... after I move the remaining n-p elements.
So I don't understand why this distinction is made for one but not the other. You could argue that insertion is considered constant for LinkedLists because of the cases where you insert at the front or back where linear time operations are not required, whereas in an ArrayList, insertion requires copying of the suffix array after position p, but I could easily counter that by saying if we insert at the back of an ArrayList, it is amortized constant time and doesn't require extra copying in most cases unless we reach capacity.
In other words we separate the linear stuff from the constant stuff for LinkedList, but we don't separate them for the ArrayList, even though in both cases, the linear operations may not be invoked or not invoked.
So why do we consider them separate for LinkedList and not for ArrayList? Or are they only being defined here in the context where LinkedList is overwhelmingly used for head/tail appends and prepends as opposed to elements in the middle?

This is basically a limitation of the Java interface for List and LinkedList, rather than a fundamental limitation of linked lists. That is, in Java there is no convenient concept of "a pointer to a list node".
Every type of list has a few different concepts loosely associated with the idea of pointing to a particular item:
The idea of a "reference" to a specific item in a list
The integer position of an item in the list
The value of a item that may be in the list (possibly multiple times)
The most general concept is the first one, and is usually encapsulated in the idea of an iterator. As it happens, the simple way to implement an iterator for an array backed list is simply to wrap an integer which refers to the position of the item in a list. So for array lists only, the first and second ways of referring to items are pretty tightly bound.
For other list types, however, and even for most other container types (trees, hashes, etc) that is not the case. The generic reference to an item is usually something like a pointer to the wrapper structure around one item (e.g., HashMap.Entry or LinkedList.Entry). For these structures the idea of accessing the nth element isn't necessary natural or even possible (e.g., unordered collections like sets and many hash maps).
Perhaps unfortunately, Java made the idea of getting an item by its index a first-class operation. Many of the operations directly on List objects are implemented in terms of list indexes: remove(int index), add(int index, ...), get(int index), etc. So it's kind of natural to think of those operations as being the fundamental ones.
For LinkedList though it's more fundamental to use a pointer to a node to refer to an object. Rather than passing around a list index, you'd pass around the pointer. After inserting an element, you'd get a pointer to the element.
In C++ this concept is embodied in the concept of the iterator, which is the first class way to refer to items in collections, including lists. So does such a "pointer" exist in Java? It sure does - it's the Iterator object! Usually you think of an Iterator as being for iteration, but you can also think of it as pointing to a particular object.
So the key observation is: given an pointer (iterator) to an object, you can remove and add from linked lists in constant time, but from an array-like list this takes linear time in general. There is no inherent need to search for an object before deleting it: there are plenty of scenarios where you can maintain or take as input such a reference, or where you are processing the entire list, and here the constant time deletion of linked lists does change the algorithmic complexity.
Of course, if you need to do something like delete the first entry containing the value "foo" that implies both a search and a delete operation. Both array-based and linked lists taken O(n) for search, so they don't vary here - but you can meaningfully separate the search and delete operations.
So you could, in principle, pass around Iterator objects rather than list indexes or object values - at least if your use case supports it. However, at the top I said that "Java has no convenient notion of a pointer to a list node". Why?
Well because actually using Iterator is actually very inconvenient. First of all, it's tough to get an Iterator to an object in the first place: for example, and unlike C++, the add() methods don't return an Iterator - so to get a pointer to the item you just added, you need to go ahead and iterate over the list or use the listIterator(int index) call, which is inherently inefficient for linked lists. Many methods (e.g., subList()) support only a version that takes indexes, but not Iterators - even when such a method could be efficiently supported.
Add to that the restrictions around iterator invalidation when the list is modified, and they actually become pretty useless for referring to elements except in immutable lists.
So Java's support of pointers to list elements is pretty half-hearted an so it's tough to leverage the constant time operations that linked list offers, except in cases such as adding to the front of a list, or deleting items during iteration.
It's not limited to lists, either - the ConcurrentQueue is also a linked structure which supports constant time deletes, but you can't reliably use that ability from Java.

If you're using a LinkedList, chances are you're not going to use it for a random access insert. LinkedList offers constant time for push (insert at the beginning) or add (because it has a ref to the final element IIRC). You are correct in your suspicion that an insert into a random index (e.g. insert sorted) will take linear time - not constant.
ArrayList, by contrast, is worst case linear. Most of the time it simply does an arraycopy to shift the indices (which is a low-level shift that is constant time). Only when you need to resize the backing array will it take linear time.

a collection data structure to keep items sorted

I got a program which is using an ArrayList<T> and that type T also implements Comparable<T>. I need to keep that list sorted.
For now, when I insert a new item, I add it to the ArrayList and then invoke Collections.sort(myArrayList).
Is sorting with Collections.sort every time I insert a new item seriously hurt run time complexity?
Is there a better suited data structure I can use to always keep the list sorted? I know of a structure called a PriorityQueue but I also need to be able to get the list's elements by index.
EDIT:
In my specific case, inserting a new item happens much less than geting an already existing item, so eventually a good advice could also be to stay with the ArrayList since it got a constant time complexity of getting an item. But if you know of anything else...

It seems like Collection.Sort is actually the way to go here since when the collection is already almost sorted, the sorting will take not longer than O(n) in the worst case.

Instead of using Collections.sort(myArrayList) after every insertion you might want to do something a bit smarter as you know that every time you insert an element your collection is already ordered.
Collections.sort(myArrayList) takes 0(nlogn) time, you could do an ordered insert in an ordered collection in O(n) time using Collections.binarySearch. If the collection is ordered in ascending order Collections.binarySearch returns the index of the element you are looking for if it exists or (-(insertion point) - 1). Before inserting an element you can look for it with Collections.binarySearch (O(logn) time). Done that you can derive the index at which inserting the new element. You can then add the element with addAt in O(n) time. The whole insertion complexity is bounded by the addAt so you can do an ordered insert in an ArrayList in O(n) time.

List is an ordered collection, which means you need to have the ability to access with index. If a collection internally shuffles or sorts the elements, the insertion order wont be same as the order of the elements in the internal data structure. So you cannot depend on index based access anymore. Hence Sun didn't provide a SortedList or a TreeList class. That is why you use Collections.sort(..)
Apache commons-collections does provide a TreeList class but it is not a sorted List and is called so because it uses a tree data structure to store the elements internally. Check its documentation here - http://commons.apache.org/proper/commons-collections/javadocs/api-3.2.1/org/apache/commons/collections/list/TreeList.html

A single data structure cannot provide both sorting and index based retrieval. If the input set is limited to a few hundreds or thousands, you can keep two data structures in parallel.
For example, ArrayList for index based retrieval and TreeMap (or priority queue) for sorting.

When do you know when to use a TreeSet or LinkedList?

What are the advantages of each structure?
In my program I will be performing these steps and I was wondering which data structure above I should be using:
Taking in an unsorted array and
adding them to a sorted structure1.
Traversing through sorted data and removing the right one
Adding data (never removing) and returning that structure as an array

When do you know when to use a TreeSet or LinkedList? What are the advantages of each structure?
In general, you decide on a collection type based on the structural and performance properties that you need it to have. For instance, a TreeSet is a Set, and therefore does not allow duplicates and does not preserve insertion order of elements. By contrast a LinkedList is a List and therefore does allow duplicates and does preserve insertion order. On the performance side, TreeSet gives you O(logN) insertion and deletion, whereas LinkedList gives O(1) insertion at the beginning or end, and O(N) insertion at a selected position or deletion.
The details are all spelled out in the respective class and interface javadocs, but a useful summary may be found in the Java Collections Cheatsheet.
In practice though, the choice of collection type is intimately connected to algorithm design. The two need to be done in parallel. (It is no good deciding that your algorithm requires a collection with properties X, Y and Z, and then discovering that no such collection type exists.)
In your use-case, it looks like TreeSet would be a better fit. There is no efficient way (i.e. better than O(N^2)) to sort a large LinkedList that doesn't involve turning it into some other data structure to do the sorting. There is no efficient way (i.e. better than O(N)) to insert an element into the correct position in a previously sorted LinkedList. The third part (copying to an array) works equally well with a LinkedList or TreeSet; it is an O(N) operation in both cases.
[I'm assuming that the collections are large enough that the big O complexity predicts the actual performance accurately ... ]

The genuine power and advantage of TreeSet lies in interface it realizes - NavigableSet
Why is it so powerfull and in which case?
Navigable Set interface add for example these 3 nice methods:
headSet(E toElement, boolean inclusive)
tailSet(E fromElement, boolean inclusive)
subSet(E fromElement, boolean fromInclusive, E toElement, boolean toInclusive)
These methods allow to organize effective search algorithm(very fast).
Example: we need to find all the names which start with Milla and end with Wladimir:
TreeSet<String> authors = new TreeSet<String>();
authors.add("Andreas Gryphius");
authors.add("Fjodor Michailowitsch Dostojewski");
authors.add("Alexander Puschkin");
authors.add("Ruslana Lyzhichko");
authors.add("Wladimir Klitschko");
authors.add("Andrij Schewtschenko");
authors.add("Wayne Gretzky");
authors.add("Johann Jakob Christoffel");
authors.add("Milla Jovovich");
authors.add("Taras Schewtschenko");
System.out.println(authors.subSet("Milla", "Wladimir"));
output:
[Milla Jovovich, Ruslana Lyzhichko, Taras Schewtschenko, Wayne Gretzky]
TreeSet doesn't go over all the elements, it finds first and last elemenets and returns a new Collection with all the elements in the range.

TreeSet:
TreeSet uses Red-Black tree underlying. So the set could be thought as a dynamic search tree. When you need a structure which is operated read/write frequently and also should keep order, the TreeSet is a good choice.

If you want to keep it sorted and it's append-mostly, TreeSet with a Comparator is your best bet. The JVM would have to traverse the LinkedList from the beginning to decide where to place an item. LinkedList = O(n) for any operations, TreeSet = O(log(n)) for basic stuff.

The most important point when choosing a data structure are its inherent limitations. For example if you use TreeSet to store objects and during run-time your algorithm changes attributes of these objects which affect equal comparisons while the object is an element of the set, get ready for some strange bugs.
The Java Doc for Set interface state that:
Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.
Interface Set Java Doc

searching an unorder list without converting it to an array

Is there a way to first sort then search for an objects within a linked list of objects.
I thought just to you one of the sorting way and a binary search what do you think?
Thanks

This is not a good approach, IMO. If you use Collections.sort(list), where the list is a LinkedList, this copies the list to a temporary array, sorts it, and then copies back to the list' i.e. O(NlogN) to sort plus 2 * O(N) copies. But when you then try to do an binary search (e.g. using Collections.binarySearch(list), each search will do O(N) list traversal operations. So you may as well have not bothered sorting the list!
Another approach would be to convert the list to an array or an ArrayList, and then sort and search that array / ArrayList. That gives one copy plus one sort to setup, and O(logN) for each search.
But neither of these is the best approach. That depends on how many times you need to perform search operations.
If you simply want to do one search on the list, then calling list.contains(...) is O(N) ... and that is better than anything involving sorting and binary searching.
If you want to do multiple searches on a list that never changes, you're probably better off putting the list entries into a HashSet. Constructing a HashSet is O(N) and searching is O(1). (This assumes you don't need your own comparator.)
If you want to do multiple searches on a list that keeps changing where the order IS NOT significant, replace the list with a HashSet. The incremental cost of updating the HashSet will be O(1) for each addition/removal, and O(1) for each search.
If you want to do multiple searches on a list that keeps changing and the order IS significant, replace the list with an insertion-ordered LinkedHashMap. That will be O(1) for each addition/removal, and O(1) for each search ... but with large constants of proportionality than for a HashSet.

java.util.Collections#sort()
java.util.Collections#binarySearch()
The Collections class has lots of other amazing methods to make programmers life easier.
Note that the sort method's implementation will indeed convert the list to array, but from you need not explicitly convert the list in to array before calling the method:)

You may want to question if searching over a sorted list is the best option for your use-case as this does not perform well. The list sort is O(NlogN) and the binary search is O(logN). You might consider making a Set out of your list elements and then searching that via the contains method, which is O(1), if you just want to see if an element exists. It would be much easier to give you some advice on what collection you might consider if you could explain more about your use-case.
EDIT: Consider performance issues of List sorting if you plan to do this for large lists.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.