Efficiently removing an element added to a ConcurrentQueue - java

In principle it is easy to remove an element from ConcurrentLinkedQueue or similar implementation. For example, the Iterator for that class supports efficient O(1) removal of the current element:
public void remove() {
Node<E> l = lastRet;
if (l == null) throw new IllegalStateException();
// rely on a future traversal to relink.
l.item = null;
lastRet = null;
}
I want to add an element to the queue with add(), and then later delete that exact element from the queue. The only options I can see are:
Save a reference to the object and call ConcurrentLinkedQueue.remove(Object o) with the object - but this forces a traversal of the whole queue in the worst case (and half on average with a random add and removal pattern).
This has the further issue that it doesn't necessarily remove the same object I inserted. It removes an equal object, which may very be a different one if multiple objects in my queue are equal.
Use ConcurrentLinkedDeque instead, then addLast() my element, then immediately grab a descendingIterator() and iterate until I find my element (which will often be the first one, but may be later since I'm effectively swimming against the tide of concurrent additions).
This addition to being awkward and potentially quite slow, this forces me to use Deque class which in this case is much more complex and slower for many operations (check out Iterator.remove() for that class!).
Furthermore this solution still has a subtle failure mode if identical (i.e., == identity) can be inserted, because I might find the object inserted by someone else, but that can ignored in the usual case that is not possible.
Both solutions seem really awkward, but deleting an arbitrary element in these kind of structures seems like a core operation. What am I missing?
It occurs to me this is a general issue with other concurrent lists and dequeues and even with non concurrent structures like LinkedList.
C++ offers it in the form of methods like insert.

Nope, there's not really any way of doing this in the Java APIs; it's not really considered a core operation.
For what it's worth, there are some significant technical difficulties to how you would do it in the first place. For example, consider ArrayList. If adding an element to an ArrayList gave you another reference object that told you where that element was...
you'd be adding an allocation to every add operation
each object would have to keep track of one (or more!) references to its "pointer" object, which would at least double the memory consumption of the data structure
every time you inserted an element into the ArrayList, you'd have to update the pointer objects for every other element whose position shifted
In short, while this might be a reasonable operation for linked-list-based data structures, there's not really any good way of fitting it into a more general API, so it's pretty much omitted.

If you specifically need this capability (e.g. you are going to have massive collections) then you will likely need to implement your own collection that returns a reference to the entry on add and has the ability to remove in O(1).
class MyLinkedList<V> {
public class Entry {
private Entry next;
private Entry prev;
private V value;
}
public Entry add(V value) {
...
}
public void remove(Entry entry) {
...
}
}
In other words you are not removing by value but by reference to the collection entry:
MyLinkedList<Integer> intList;
MyLinkedList.Entry entry = intList.add(15);
intList.remove(entry);
That's obviously a fair amount of work to implement.

Related

Sorting a PriorityQueue with Queue Size [duplicate]

I'm trying to use a PriorityQueue to order objects using a Comparator.
This can be achieved easily, but the objects class variables (with which the comparator calculates priority) may change after the initial insertion. Most people have suggested the simple solution of removing the object, updating the values and reinserting it again, as this is when the priority queue's comparator is put into action.
Is there a better way other than just creating a wrapper class around the PriorityQueue to do this?
You have to remove and re-insert, as the queue works by putting new elements in the appropriate position when they are inserted. This is much faster than the alternative of finding the highest-priority element every time you pull out of the queue. The drawback is that you cannot change the priority after the element has been inserted. A TreeMap has the same limitation (as does a HashMap, which also breaks when the hashcode of its elements changes after insertion).
If you want to write a wrapper, you can move the comparison code from enqueue to dequeue. You would not need to sort at enqueue time anymore (because the order it creates would not be reliable anyway if you allow changes).
But this will perform worse, and you want to synchronize on the queue if you change any of the priorities. Since you need to add synchronization code when updating priorities, you might as well just dequeue and enqueue (you need the reference to the queue in both cases).
I don't know if there is a Java implementation, but if you're changing key values alot, you can use a Fibonnaci heap, which has O(1) amortized cost to decrease a key value of an entry in the heap, rather than O(log(n)) as in an ordinary heap.
One easy solution that you can implement is by just adding that element again into the priority queue. It will not change the way you extract the elements although it will consume more space but that also won't be too much to effect your running time.
To proof this let's consider dijkstra algorithm below
public int[] dijkstra() {
int distance[] = new int[this.vertices];
int previous[] = new int[this.vertices];
for (int i = 0; i < this.vertices; i++) {
distance[i] = Integer.MAX_VALUE;
previous[i] = -1;
}
distance[0] = 0;
previous[0] = 0;
PriorityQueue<Node> pQueue = new PriorityQueue<>(this.vertices, new NodeComparison());
addValues(pQueue, distance);
while (!pQueue.isEmpty()) {
Node n = pQueue.remove();
List<Edge> neighbours = adjacencyList.get(n.position);
for (Edge neighbour : neighbours) {
if (distance[neighbour.destination] > distance[n.position] + neighbour.weight) {
distance[neighbour.destination] = distance[n.position] + neighbour.weight;
previous[neighbour.destination] = n.position;
pQueue.add(new Node(neighbour.destination, distance[neighbour.destination]));
}
}
}
return previous;
}
Here our interest is in line
pQueue.add(new Node(neighbour.destination, distance[neighbour.destination]));
I am not changing priority of the particular node by removing it and adding again rather I am just adding new node with same value but different priority.
Now at the time of extracting I will always get this node first because I have implemented min heap here and the node with value greater than this (less priority) always be extracted afterwards and in this way all neighboring nodes will already be relaxed when less prior element will be extracted.
Without reimplementing the priority queue yourself (so by only using utils.PriorityQueue) you have essentially two main approaches:
1) Remove and put back
Remove element then put it back with new priority. This is explained in the answers above. Removing an element is O(n) so this approach is quite slow.
2) Use a Map and keep stale items in the queue
Keep a HashMap of item -> priority. The keys of the map are the items (without their priority) and the values of the map are the priorities.
Keep it in sync with the PriorityQueue (i.e. every time you add or remove an item from the Queue, update the Map accordingly).
Now when you need to change the priority of an item, simply add the same item to the queue with a different priority (and update the map of course). When you poll an item from the queue, check if its priority is the same than in your map. If not, then ditch it and poll again.
If you don't need to change the priorities too often, this second approach is faster. Your heap will be larger and you might need to poll more times, but you don't need to find your item.
The 'change priority' operation would be O(f(n)log n*), with f(n) the number of 'change priority' operation per item and n* the actual size of your heap (which is n*f(n)).
I believe that if f(n) is O(n/logn)(for example f(n) = O(sqrt(n)), this is faster than the first approach.
Note : in the explanation above, by priority I means all the variables that are used in your Comparator. Also your item need to implement equals and hashcode, and both methods shouldn't use the priority variables.
It depends a lot on whether you have direct control of when the values change.
If you know when the values change, you can either remove and reinsert (which in fact is fairly expensive, as removing requires a linear scan over the heap!).
Furthermore, you can use an UpdatableHeap structure (not in stock java though) for this situation. Essentially, that is a heap that tracks the position of elements in a hashmap. This way, when the priority of an element changes, it can repair the heap. Third, you can look for an Fibonacci heap which does the same.
Depending on your update rate, a linear scan / quicksort / QuickSelect each time might also work. In particular if you have much more updates than pulls, this is the way to go. QuickSelect is probably best if you have batches of update and then batches of pull opertions.
To trigger reheapify try this:
if(!priorityQueue.isEmpty()) {
priorityQueue.add(priorityQueue.remove());
}
Something I've tried and it works so far, is peeking to see if the reference to the object you're changing is the same as the head of the PriorityQueue, if it is, then you poll(), change then re-insert; else you can change without polling because when the head is polled, then the heap is heapified anyways.
DOWNSIDE: This changes the priority for Objects with the same Priority.
Is there a better way other than just creating a wrapper class around the PriorityQueue to do this?
It depends on the definition of "better" and the implementation of the wrapper.
If the implementation of the wrapper is to re-insert the value using the PriorityQueue's .remove(...) and .add(...) methods,
it's important to point out that .remove(...) runs in O(n) time.
Depending on the heap implementation,
updating the priority of a value can be done in O(log n) or even O(1) time,
therefore this wrapper suggestion may fall short of common expectations.
If you want to minimize your effort to implement,
as well as the risk of bugs of any custom solution,
then a wrapper that performs re-insert looks easy and safe.
If you want the implementation to be faster than O(n),
then you have some options:
Implement a heap yourself. The wikipedia entry describes multiple variants with their properties. This approach is likely to get your the best performance, at the same time the more code you write yourself, the greater the risk of bugs.
Implement a different kind of wrapper: handlee updating the priority by marking the entry as removed, and add a new entry with the revised priority.
This is relatively easy to do (less code), see below, though it has its own caveats.
I came across the second idea in Python's documentation,
and applied it to implement a reusable data structure in Java (see caveats at the bottom):
public class UpdatableHeap<T> {
private final PriorityQueue<Node<T>> pq = new PriorityQueue<>(Comparator.comparingInt(node -> node.priority));
private final Map<T, Node<T>> entries = new HashMap<>();
public void addOrUpdate(T value, int priority) {
if (entries.containsKey(value)) {
entries.remove(value).removed = true;
}
Node<T> node = new Node<>(value, priority);
entries.put(value, node);
pq.add(node);
}
public T pop() {
while (!pq.isEmpty()) {
Node<T> node = pq.poll();
if (!node.removed) {
entries.remove(node.value);
return node.value;
}
}
throw new IllegalStateException("pop from empty heap");
}
public boolean isEmpty() {
return entries.isEmpty();
}
private static class Node<T> {
private final T value;
private final int priority;
private boolean removed = false;
private Node(T value, int priority) {
this.value = value;
this.priority = priority;
}
}
}
Note some caveats:
Entries marked removed stay in memory until they are popped
This can be unacceptable in use cases with very frequent updates
The internal Node wrapped around the actual values is an extra memory overhead (constant per entry). There is also an internal Map, mapping all the values currently in the priority queue to their Node wrapper.
Since the values are used in a map, users must be aware of the usual cautions when using a map, and make sure to have appropriate equals and hashCode implementations.

Node reference into linked list like .Net has, to enable O(1) item insertion

.Net's LinkedList has a nice basic linked list feature that allows me to keep a node reference, a "pointer" into a linked list so to speak, and use that reference to navigate and manipulate the linked list from there in an O(1) fashion. To wit:
LinkedList<string> linkedList = new LinkedList<string>();
LinkedListNode<string> cur = linkedList.First;
LinkedListNode<string> rememberThis = null;
do
{
if (...)
rememberThis = cur;
} while ((cur = cur.Next) != null);
if (rememberThis != null)
linkedList.AddAfter(rememberThis, "added-value");
I'm failing to see how I can do the same in Java, namely
Iterating through a LinkedList (this of course is O(n))
Making note of a list node
Use that node reference even after further iteration for O(1) insertion
Java does give me access to a ListIterator, which allows me to do manipulation of the list around the item where I'm at, but I cannot seem to iterate on, while holding on to a previous node.
Am I missing something?
Am I missing something?
No. LinkedList#ListItr class doesn't have bookmark. So you cannot keep iterating on, while holding on to a previous node.
There's no O(1) method addAfter(Node node, E element) in LinkedList, because LinkedList#Node is private. There's add(int index, E element) which is O(n). Too sad.
A workaround is to use 2 ListIterator. One keep iterating on, the other one stops at the position you want to remember. Then you can use ListIterator#add(E e) in the end, which is O(1). But the first one cannot modify the list otherwise it'll break the second one.
No, don't do that, that will breaks. If you ever modified the LinkedList structurally later, a ConcurrentModificationException will be thrown next time you move the ListIterator's crusor, and there is no way around it. This is known as fail-fast behavior.
Anyway, Iterators aren't meant to hold a cursor in a list for a long time. And currently there is no way to hold a cursor to a certain position in a list, including LinkedLists, for a long time, even for openjdk 9 ea IIRC. The reason behind it may be the ambiguity of how to move the existing cursors. This may be obvious in manyq situations, but not always.
After all, it's (almost) impossible to add it to a superinterface of LinkedList (Queue,Deque,List) now.(This is clearly a API design fault!) You can create your own version of LinkedList to implement that.
If you really want to keep a reference, somehow. You will have to hack the internals with reflections, which may doesn't worth it at all.

Moving pointers instead of objects in some kind of list or queue in Java

Java offers a LinkedList implementation of the List interface which is actually a doubly-linked list. If we do the following:
linkedlist.remove(obj);
and then:
linkedlist.add(obj);
we actually remove the object obj from the linkedlist and reinsert it from the right-end (tail).
We can also implement manually a linkedlist with nodes, head, and tail. In languages such as C++ which have some low-level characteristics, one can use pointers for the next and previous object of the obj object. Thus we don't have actually to remove the item but just to update the previous and next pointers.
Is there any data structure in Java with which we can have the same effect (and thus the same performance gain of removing only "pointers" insted of the objects themselves)?
Note that I would like to use a ready-to-go data structure instead of manually writting my one linkedlist implementation (and perhaps reinventing the wheel). Moreover, please note that it has not neccessarily to be a linkedlist - it might be, for example, some kind of queue such as an ArrayDeque.
EDIT: To put it a little differently, if internally the LinkedList implementation of the List interface in Java makes use of prev and next pointers, then why l.remove(obj) is O(n) and not O(1)? And thus in practice, when you have a LinkedList with many millions of objects (as in my case), it takes so long time to do this removal and re-insertion? (Same with ArrayList, same with ArrayDeque - very long time).
Java does exactly the same thing as C++. All references to objects are pointers. So, in a Node like
public class Node [
private Object value;
private Node next;
private Node previous;
}
value, next and previous are pointers (called references in Java) respectively to the value of the node, the next node and the previous node.
The difference with C++ is that you don't have pointer arithmetics: value++, for example, doesn't compile and doesn't make the pointer reference what is located at the next memory address.
EDIT:
The LinkedList class doesn't expose its nodes to the outside. They're completely private to the LinkedList. So, removing an object consists in iterating over all the nodes to find the one having a value which is equal (in terms of Object.equals()) to the given object, and to remove the found node from the list. Removing the node consists in making the previous point to the next, and vice-versa. This is why removing an object is O(n). Of course, if you had access to the Node and were able to remove it, the operation would be O(1). If you need that, you'll have to implement your own LinkedList, exactly the same way as you would do it in C++. References are pointers.
To answer the question about why remove(Object obj) is O(n):
First, in order for it to be O(1), you'd need to give remove the actual Node reference, not the Object. The Object won't point back to the Node. In order to find the correct Node to return, the code must either search the list to find the Node that contains the object, or keep a hash map that would let it find the Node. The actual LinkedList implementation does a simple linear search. However, even a hash map would, technically, be O(n) since there's the possibility that all the objects on the list have the same hash code, although it would still be much faster than a linear search.
Second, remove(Object obj) is defined in terms of equals. If there is a different object obj2 that was added to the list, and obj2.equals(obj) is true, then remove(obj) will remove obj2 if the obj reference itself was never added to the list.
To really do this right, you'd need either an add method that returns a node reference, so that your program could keep track of the node reference and use that as the remove argument; or you could require that objects on the list implement some sort of NodePointer interface:
interface NodePointer {
void setNodePointer(Object node);
Object getNodePointer();
}
that the list would then use to stuff the node pointers into the objects. (But that would probably mean an object could only live on one linked list at a time, a restriction that LinkedList doesn't impose.) In either case, I don't think this is something the Java library supports.

Can a iterator change the collection it is iterating over? Java

I'm attempting to use the number of iterations from an iterator as a counter, but was wondering the ramifications of doing so.
private int length(Iterator<?> it) {
int i = 0;
while(it.hasNext()) {
it.next();
i++;
}
return i;
}
This works fine, but I'm worried about what the iterator may do behind the scenes. Perhaps as I'm iterating over a stack, it pops the items off the stack, or if I'm using a priority queue, and it modifies the priority.
The javadoc say this about iterator:
next
E next()
Returns the next element in the iteration.
Returns:
the next element in the iteration
Throws:
NoSuchElementException - if the iteration has no more elements
I don't see a guarantee that iterating over this unknown collection won't modify it. Am I thinking of unrealistic edge cases, or is this a concern? Is there a better way?
The Iterator simply provides an interface into some sort of stream, therefore not only is it perfectly possible for next() to destroy data in some way, but it's even possible for the data in an Iterator to be unique and irreplaceable.
We could come up with more direct examples, but an easy one is the Iterator in DirectoryStream. While a DirectoryStream is technically Iterable, it only allows one Iterator to be constructed, so if you tried to do the following:
Path dir = ...
try (DirectoryStream<Path> stream = Files.newDirectoryStream(dir)) {
int count = length(stream.iterator());
for (Path entry: stream) {
...
}
}
You would get an exception in the foreach block, because the stream can only be iterated once. So in summary, it is possible for your length() method to change objects and lose data.
Furthermore, there's no reason an Iterator has to be associated with some separate data-store. Take for example an answer I gave a few months ago providing a clean way to select n random numbers. By using an infinite Iterator we are able to provide, filter, and pass around arbitrarily large amounts of random data lazily, no need to store it all at once, or even compute them until they're needed. Because the Iterator doesn't back any data structure, querying it is obviously destructive.
Now that said, these examples don't make your method bad. Notice that the Guava library (which everyone should be using) provides an Iterators class with exactly the behavior you detail above, called size() to conform with the Collections Framework. The burden is then on the user of such methods to be aware of what sort of data they're working with, and avoid making careless calls such as trying to count the number of results in an Iterator that they know cannot be replaced.
As far as I can tell, the Collection specification does not explicitly state that iterating over a collection does not modify it, but no classes in the standard library show that behaviour (actually at least one does, see dimo414's answer), so any class that did would be highly suspect. I don't think you need to worry about this.
Note that the Guava library implements Iterators.size() and Iterables.size() in the same way that you are, so clearly they find it safe in the general case.
No, iterating over a collection will not modify the collection. The Iterator class does have a remove() method, which is the only safe way of removing an element from a collection during iteration. But simply calling hasNext() and next() will not modify the collection.
Keep in mind that if you modify the object returned by next(), those changes will be present in your collection.
Think about it -- methods that return things are (if they are written correctly) accessor methods, meaning that they just return data. They do not modify it (they are not mutator methods).
Here's an example I had on my disk of how an iterator might be implemented. As you can see, no values are actually modified.
public class ArraySetIterator implements Iterator
{
private int nextIndex;
private ArraySet theArraySet;
public ArraySetIterator (ArraySet a)
{
this.nextIndex = 0;
this.theArraySet = a;
}
public boolean hasNext ()
{
return this.nextIndex < this.theArraySet.size();
}
public Object next()
{
return this.theArraySet.get(this.nextIndex++);
}
}

How to safely remove other elements from a Collection while iterating through the Collection

I'm iterating over a JRE Collection which enforces the fail-fast iterator concept, and thus will throw a ConcurrentModificationException if the Collection is modified while iterating, other than by using the Iterator.remove() method . However, I need to remove an object's "logical partner" if the object meets a condition. Thus preventing the partner from also being processed. How can I do that? Perhaps by using better collection type for this purpose?
Example.
myCollection<BusinessObject>
for (BusinessObject anObject : myCollection)
{
if (someConditionIsTrue)
{
myCollection.remove(anObjectsPartner); // throws ConcurrentModificationException
}
}
Thanks.
It's not a fault of the collection, it's the way you're using it. Modifying the collection while halfway through an iteration leads to this error (which is a good thing as the iteration would in general be impossible to continue unambiguously).
Edit: Having reread the question this approach won't work, though I'm leaving it here as an example of how to avoid this problem in the general case.
What you want is something like this:
for (Iterator<BusinessObject> iter = myCollection.iterator; iter.hasNext(); )
{
BusinessObject anObject = iter.next();
if (someConditionIsTrue)
{
iter.remove();
}
}
If you remove objects through the Iterator itself, it's aware of the removal and everything works as you'd expect. Note that while I think all standard collections work nicely in this respect, Iterators are not required to implement the remove() method so if you have no control over the class of myCollection (and thus the implementation class of the returned iterator) you might need to put more safety checks in there.
An alternative approach (say, if you can't guarantee the iterator supports remove() and you require this functionality) is to create a copy of the collection to iterate over, then remove the elements from the original collection.
Edit: You can probably use this latter technique to achieve what you want, but then you still end up coming back to the reason why iterators throw the exception in the first place: What should the iteration do if you remove an element it hasn't yet reached? Removing (or not) the current element is relatively well-defined, but you talk about removing the current element's partner, which I presume could be at a random point in the iterable. Since there's no clear way that this should be handled, you'll need to provide some form of logic yourself to cope with this. In which case, I'd lean towards creating and populating a new collection during the iteration, and then assigning this to the myCollection variable at the end. If this isn't possible, then keeping track of the partner elements to remove and calling myCollection.removeAll would be the way to go.
You want to remove an item from a list and continue to iterate on the same list. Can you implement a two-step solution where in step 1 you collect the items to be removed in an interim collection and in step 2 remove them after identifying them?
Some thoughts (it depends on what exactly the relationship is between the two objects in the collection):
A Map with the object as the key and the partner as the value.
A CopyOnWriteArrayList, but you have to notice when you hit the partner
Make a copy into a different Collection object, and iterate over one, removing the other. If this original Collection can be a Set, that would certaily be helpful in removal.
You could try finding all the items to remove first and then remove them once you have finished processing the entire list. Skipping over the deleted items as you find them.
myCollection<BusinessObject>
List<BusinessObject> deletedObjects = new ArrayList(myCollection.size());
for (BusinessObject anObject : myCollection)
{
if (!deletedObjects.contains(anObject))
{
if (someConditionIsTrue)
{
deletedObjects.add(anObjectsPartner);
}
}
}
myCollection.removeAll(deletedObjects);
CopyOnWriteArrayList will do what you want.
Why not use a Collection of all the original BusinessObject and then a separate class (such as a Map) which associates them (ie creates partner)? Put these both as a composite elements in it's own class so that you can always remove the Partner when Business object is removed. Don't make it the responsibility of the caller every time they need to remove a BusinessObject from the Collection.
IE
class BusinessObjectCollection implements Collection<BusinessObject> {
Collection<BusinessObject> objects;
Map<BusinessObject, BusinessObject> associations;
public void remove(BusinessObject o) {
...
// remove from collection and dissasociate...
}
}
The best answer is the second, use an iterator.

Categories

Resources