ConcurrentLinkedQueue$Node remains in heap after remove()

ConcurrentLinkedQueue$Node remains in heap after remove() - java

I have a multithreaded app writing and reading a ConcurrentLinkedQueue, which is conceptually used to back entries in a list/table. I originally used a ConcurrentHashMap for this, which worked well. A new requirement required tracking the order entries came in, so they could be removed in oldest first order, depending on some conditions. ConcurrentLinkedQueue appeared to be a good choice, and functionally it works well.
A configurable amount of entries are held in memory, and when a new entry is offered when the limit is reached, the queue is searched in oldest-first order for one that can be removed. Certain entries are not to be removed by the system and wait for client interaction.
What appears to be happening is I have an entry at the front of the queue that occurred, say 100K entries ago. The queue appears to have the limited number of configured entries (size() == 100), but when profiling, I found that there were ~100K ConcurrentLinkedQueue$Node objects in memory. This appears to be by design, just glancing at the source for ConcurrentLinkedQueue, a remove merely removes the reference to the object being stored but leaves the linked list in place for iteration.
Finally my question: Is there a "better" lazy way to handle a collection of this nature? I love the speed of the ConcurrentLinkedQueue, I just cant afford the unbounded leak that appears to be possible in this case. If not, it seems like I'd have to create a second structure to track order and may have the same issues, plus a synchronization concern.

What actually is happening here is the remove method prepares a polling thread to null out the linked reference.
The ConcurrentLinkedQueue is a non blocking thread safe Queue implementation. However when you try to poll a Node from the Queue it is a two function process. First you null the value then you null the reference. A CAS operation is a single atomic function that would not offer immidiate resolution for a poll.
What happens when you poll is that the first thread that succeeds will get the value of the node and null that value out, that thread will then try to null the reference. It is possible another thread will then come in and try to poll from the queue. To ensure this Queue holds a non blocking property (that is failure of one thread will not lead to the failure of another thread) that new incomming thread will see if the value is null, if it is null that thread will null the reference and try again to poll().
So what you see happening here is the remove thread is simply preparing any new polling thread to null the reference. Trying to achieve a non blocking remove function I would think is nearly impossible because that would require three atomic functions. The null of the value the null referencing to said node, and finally the new reference from that nodes parent to its successor.
To answer your last question. There is unforutnalty no better way to implement remove and maintain the non blocking state of the queue. That is at least at this point. Once processors start comming out with 2 and 3 way casing then that is possible.

The queue's main semantics is add/poll. If you use poll() on the ConcurrentLinkedQueue, it will be cleaned as it should. Based on your description, poll() should give you removing oldest entry. Why not to use it instead of remove()?

Looking at the source code for 1.6.0_29, it seems that CLQ's iterator was modified to try removing nodes with null items. Instead of:
p = p.getNext();
The code is now:
Node<E> next = succ(p);
if (pred != null && next != null)
pred.casNext(p, next);
p = next;
This was added as part of the fix for bug: http://bugs.sun.com/view_bug.do?bug_id=6785442
Indeed when I try the following I get an OOME with the old version but not with the new one:
Queue<Integer> queue = new ConcurrentLinkedQueue<Integer>();
for (int i=0; i<10000; i++)
{
for (int j=0; j<100000; j++)
{
queue.add(j);
}
boolean start = true;
for (Iterator<Integer> iter = queue.iterator(); iter.hasNext(); )
{
iter.next();
if (!start)
iter.remove();
start = false;
}
System.out.println(i);
}

Related

What would happen if putting and taking simultaneously when Java LinkedBlocking Queue only has one element?

LinkedBlocking Queue has two locks, one for putting, one for taking. When the size of the queue is 1, I think two threads can lock and manipulate the queue simultaneously, which will cause undefined behavior. Am I wrong?
// method put: // method take:
// put lock // take lock
putLocK.lockInterruptibly(); takeLock.lockInterruptibly();
... ...
while(count.get() == capacity){ while(count.get() == 0){
notFull.await(); notEmpty.await();
} }
enqueue(node); x = dequeue();
// method enqueue: // method dequeue:
last = last.next = node; Node<E> h = head;
... Node<E> first = h.next;
h.next = h;
head = first;
E x = first.item;
first.item = null;
return x;
Clearly put thread and take thread can lock when there's only one item in queue, therefore they will execute codes in method enqueue and dequeue respectively. I mean if take thread enters method dequeue, after all that pointer modification, doesn't collide with the codes in enqueue?
Links here says "However when the queue is empty then the contention cannot be avoided, and so extra code is required to handle this common 'edge' case"
Is BlockingQueue completely thread safe in Java

The javadoc for BlockingQueue (the superclass of LinkedBlockingQueue) states this:
BlockingQueue implementations are thread-safe. All queuing methods achieve their effects atomically using internal locks or other forms of concurrency control.
The word "atomically" means that if two operations (for example a put and a take) happen simultaneously, then the implementation will ensure that they behave according to the contract. The effect will be as if the put happens before get or vice-versa. That applies to edge-cases as well, such as your example of a queue with one element.
In fact, since put and get are blocking operations, the relative ordering of the two operations won't matter. With offer / poll or add / remove the order does matter, but you can't control it.
Please note that the above is based solely on what the javadoc says. Assuming that I have interpreted the javadoc correctly, then it applies to all1 BlockingQueue implementations, irrespective of whether they use one or two locks ... or none at all. If a BlockingQueue implementation doesn't behave as above, that is a bug!
1 - All implementations that implement the API correctly. That should cover all of the Java SE classes.

put implementation of a LinkedBlockingQueue
public void put(E e) throws InterruptedException {
// some lock and node code
// the part that matters here
try {
while (count.get() == capacity) {
notFull.await();
}
// put the item in the queue.
} finally {
// not important here
}
}
Basically, in put, the calling thread waits for the capacity to be less than the max continuing.
Even though the thread putting the value on the queue grabs a lock that is different from the take thread, it waits to add it to the queue until the queue is not full.
take has a similar implementation with regards to notEmpty instead of notFull.

After 2 days of search, I finally get it...
When the queue has only one item, according to LinkedBlocking Queue's design, there are actually two nodes: the dummy head and really item(meanwhile last points to it). It's true that put thread and take thread can both get their lock, but they modify different parts of the queue.
Put thread will call
last = last.next = node; // last points to the only item in queue
Take thread will call
Node<E> h = head;
Node<E> first = h.next; // first also points to the only item in queue
h.next = h;
head = first;
E x = first.item;
first.item = null;
return x;
The intersection of these two threads is what last points to in put thread and what first points to in take thread.
Notice that put thread only modifies last.item and take thread only modifies first.next. Although these two threads modifies the same object instance, they modifies the different member of it, and it won't bring about any thread conflicts.

Java - Concurrent Modification Exception

I am getting a concurrent modification exception on the following code:
for(Iterator<Tile> iter = spawner.activeTiles.iterator(); iter.hasNext();) {
Tile tile = iter.next();
canvas.drawRect(tile, tile.getColor());
}
I understand that concurrent modification happens when it is changed while it is iterating(adding/removing inside of the iteration). I also understand that they can happen when multithreading which is where I think my problem is.
In my game I have a few timers that run on threads. I have a spawner, which adds values to activeTiles on each tick. I then have a 2 timers, one for fading in and one for fading out. Without giving away my game, the tile is basically removed from the list when the fade out has finished, or when the player taps the tile. So there are a few instances where the tiles are removed from the list of tiles:
for(Iterator<Tile> iter = spawner.activeTiles.iterator(); iter.hasNext();) {
Tile tile = iter.next();
if(tile.contains(x, y) && tile.equals(spawner.activeTiles.get(0))) {
vibrator.vibrate(50);
tile.setBroken(true);
score ++;
spawner.setTileDelayInfo();
iter.remove();
and before each new spawn, it removes all of the failed tiles:
private void removeFailedTiles() {
for(Iterator<Tile> iter = activeTiles.iterator(); iter.hasNext();) {
Tile tile = iter.next();
if(tile.isFailed()) {
iter.remove();
}
}
}
It almost seems to happen randomly. So I think it has to do something with timing, but I am new to this kind of exception and don't really know what to look for or why this is happening.

The good news: you nailed the root cause of the problem in your question - you can't have multiple threads accessing a list at the same time unless they're all just reading.
You can address this in one of two ways, depending on how the rest of your code operates. The most 'correct' way is steffen's answer: any list access should be guarded with a synchronized block, and that includes holding the lock for the full duration of any list iterations. Note that if you do this, you want to do as little work as possible while holding the lock - in particular, it's a bad idea to do any sort of listener callbacks while holding a lock.
Your second option is to use a CopyOnWriteArrayList, which is thread-safe and doesn't require any external synchronization - but any modifications to the list (add/remove/replace calls) become significantly more expensive.

Multithreading can be a source of ConcurrentModificationExceptions. It can happen when one thread is modifying the structure of the collection while another thread has an Iterator iterating over it. This can lead to unexpected states in your application when the state of the collection changes when a section of code needs a consistent view of the data. This is needed when you're iterating over a collection of Tiles.
You need to syncrhonize access to the activeTiles collection. Anything that modifies this collection structurally (add or remove), or iterates over it, must synchronize on this collection.
Add a synchronized (activeTiles) block around all code that iterates or structuraly modifies activeTiles. This includes all 3 code snippets you've provided here.
Alternatively, you can make the 3 methods corresponding to your code snippets synchronized.
Either way, no other Thread can execute any of the synchronized blocks until another Thread is finished with its syncrhonized section, preventing the ConcurrentModificationException.

It's not safe to remove elements with an Iterator that supports element-removal, when you're iterating the collection in another thread.
Acquire a Lock in all threads on activeTiles before iterating them.

You might want to make your list thread-safe. Use Collections.synchronizedList().
threadSafeActiveTiles = Collections.synchronizedList(activeTiles);
Mind that you must synchronize on that list when iterating over it:
synchronized (threadSafeActiveTiles) {
for (Iterator<Tile> it = threadSafeActiveTiles.iterator(); it.hasNext();) {
Tile tile = it.next();
// ...
}
}
You then can safely have multiple threads modifying the list, which seems to be your case.
The list returned by Collections.synchronizedList() saves you from having to use the synchronized block (above) in single operations on that list, like add(e), size(), get(i) and so on...

LinkedList Vs ConcurrentLinkedQueue

Currently in a multithreaded environment, we are using a LinkedList to hold data. Sometimes in the logs we get NoSuchElementException while it is polling the linkedlist. Please help in understanding the performance impact if we move from the linkedlist to ConcurrentLinkedQueue implementation.
Thanks,
Sachin

When you get a NoSuchElementException then this maybe because of not synchronizing properly.
For example: You're checking with it.hasNext() if an element is in the list and afterwards trying to fetch it with it.next(). This may fail when the element has been removed in between and that can also happen when you use synchronized versions of Collection API.
So your problem cannot really be solved with moving to ConcurrentLinkedQueue. You may not getting an exception but you've to be prepared that null is returned even when you checked before that it is not empty. (This is still the same error but implementation differs.) This is true as long as there is no proper synchronization in YOUR code having checks for emptiness and element retrieving in the SAME synchronized scope.
There is a good chance that you trade NoSuchElementException for having new NullPointerException afterwards.
This may not be an answer directly addressing your question about performance, but having NoSuchElementException in LinkedList as a reason to move to ConcurrentLinkedQueue sounds a bit strange.
Edit
Some pseudo-code for broken implementations:
//list is a LinkedList
if(!list.isEmpty()) {
... list.getFirst()
}
Some pseudo-code for proper sync:
//list is a LinkedList
synchronized(list) {
if(!list.isEmpty()) {
... list.getFirst()
}
}
Some code for "broken" sync (does not work as intended).
This maybe the result of directly switching from LinkedList to CLQ in the hope of getting rid of synchronization on your own.
//queue is instance of CLQ
if(!queue.isEmpty()) { // Does not really make sense, because ...
... queue.poll() //May return null! Good chance for NPE here!
}
Some proper code:
//queue is instance of CLQ
element = queue.poll();
if(element != null) {
...
}
or
//queue is instance of CLQ
synchronized(queue) {
if(!queue.isEmpty()) {
... queue.poll() //is not null
}
}

ConcurrentLinkedQueue [is] an unbounded, thread-safe, FIFO-ordered queue. It uses a linked structure, similar to those we saw in Section 13.2.2 as the basis for skip lists, and in Section 13.1.1 for hash table overflow chaining. We noticed there that one of the main attractions of linked structures is that the insertion and removal operations implemented by pointer rearrangements perform in constant time. This makes them especially useful as queue implementations, where these operations are always required on cells at the ends of the structure, that is, cells that do not need to be located using the slow sequential search of linked structures.
ConcurrentLinkedQueue uses a CAS-based wait-free algorithm that is, one that guarantees that any thread can always complete its current operation, regardless of the state of other threads accessing the queue. It executes queue insertion and removal operations in constant time, but requires linear time to execute size. This is because the algorithm, which relies on co-operation between threads for insertion and removal, does not keep track of the queue size and has to iterate over the queue to calculate it when it is required.
From Java Generics and Collections, ch. 14.2.
Note that ConcurrentLinkedQueue does not implement the List interface, so it suffices as a replacement for LinkedList only if the latter was used purely as a queue. In this case, ConcurrentLinkedQueue is obviously a better choice. There should be no big performance issue unless its size is frequently queried. But as a disclaimer, you can only be sure about performance if you measure it within your own concrete environment and program.

Traversing a Binary Tree with multiple threads

So I'm working on a speed contest in Java. I have (number of processors) threads doing work, and they all need to add to a binary tree. Originally I just used a synchronized add method, but I wanted to make it so threads could follow each other through the tree (each thread only has the lock on the object it's accessing). Unfortunately, even for a very large file (48,000 lines), my new binary tree is slower than the old one. I assume this is because I'm getting and releasing a lock every time I move in the tree. Is this the best way to do this or is there a better way?
Each node has a ReentrantLock named lock, and getLock() and releaseLock() just call lock.lock() and lock.unlock();
My code:
public void add(String sortedWord, String word) {
synchronized(this){
if (head == null) {
head = new TreeNode(sortedWord, word);
return;
}
head.getLock();
}
TreeNode current = head, previous = null;
while (current != null) {
// If this is an anagram of another word in the list..
if (current.getSortedWord().equals(sortedWord)) {
current.add(word);
current.releaseLock();
return;
}
// New word is less than current word
else if (current.compareTo(sortedWord) > 0) {
previous = current;
current = current.getLeft();
if(current != null){
current.getLock();
previous.releaseLock();
}
}
// New word greater than current word
else {
previous = current;
current = current.getRight();
if(current != null){
current.getLock();
previous.releaseLock();
}
}
}
if (previous.compareTo(sortedWord) > 0) {
previous.setLeft(sortedWord, word);
}
else {
previous.setRight(sortedWord, word);
}
previous.releaseLock();
}
EDIT: Just to clarify, my code is structured like this: The main thread reads input from a file and adds the words to a queue, each worker thread pull words from the queue and does some work (including sorting them and adding them to the binary tree).

Another thing. There definitely is no place for a binary tree in performance critical code. The cacheing behaviour will kill all performance. It should have a much larger fan out (one cache line) [edit] With a binary tree you access too much non-contiguous memory. Take a look at the material on Judy trees.
And you probably want to start with a radix of at least one character before starting the tree.
And do the compare on an int key instead of a string first.
And perhaps look at tries
And getting rid of all the threads and synchronization. Just try to make the problem memory access bound
[edit]
I would do this a bit different. I would use a thread for each first character of the string, and give them their own BTree (or perhaps a Trie). I'd put a non-blocking work queue in each thread and fill them based on the first character of the string. You can get even more performance by presorting the add queue and doing a merge sort into the BTree. In the BTree, I'd use int keys representing the first 4 characters, only refering to the strings in the leaf pages.
In a speed contest, you hope to be memory access bound, and therefore have no use for threads. If not, you're still doing too much processing per string.

I would actually start looking at the use of compare() and equals() and see if something can be improved there. You might wrap you String object in another class with an different, optimized for your usecase, compare() method. For instance, consider using hashCode() instead of equals(). The hashcode is cached so future calls will be that much faster.
Consider interning the strings. I don't know if the vm will accept that many strings but it's worth checking out.
(this was going to be a comment to an answer but got too wordy).
When reading the nodes you need to get a read-lock for each node as you reach it. If you read-lock the whole tree then you gain nothing.
Once you reach the node you want to modify, you release the read lock for that node and try to acquire the write lock. Code would be something like:
TreeNode current; // add a ReentrantReadWriteLock to each node.
// enter the current node:
current.getLock().readLock().lock();
if (isTheRightPlace(current) {
current.getLock().readLock().unlock();
current.getLock().writeLock().lock(); // NB: getLock returns a ConcurrentRWLock
// do stuff then release lock
current.getLock().writeLock().unlock();
} else {
current.getLock().readLock().unlock();
}

You may try using an upgradeable read/write-lock (maybe its called an upgradeable shared lock or the like, I do not know what Java provides): use a single RWLock for the whole tree. Before traversing the B-Tree, you acquire the read (shared) lock and you release it when done (one acquire and one release in the add method, not more).
At the point where you have to modify the B-Tree, you acquire the write (exclusive) lock (or "upgrade" from read to write lock), insert the node and downgrade to read (shared) lock.
With this technique the synchronization for checking and inserting the head node can also be removed!
It should look somehow like this:
public void add(String sortedWord, String word) {
lock.read();
if (head == null) {
lock.upgrade();
head = new TreeNode(sortedWord, word);
lock.downgrade();
lock.unlock();
return;
}
TreeNode current = head, previous = null;
while (current != null) {
if (current.getSortedWord().equals(sortedWord)) {
lock.upgrade();
current.add(word);
lock.downgrade();
lock.unlock();
return;
}
.. more tree traversal, do not touch the lock here ..
...
}
if (previous.compareTo(sortedWord) > 0) {
lock.upgrade();
previous.setLeft(sortedWord, word);
lock.downgrade();
}
else {
lock.upgrade();
previous.setRight(sortedWord, word);
lock.downgrade();
}
lock.unlock();
}
Unfortunately, after some googling I could not find a suitable "ugradeable" rwlock for Java. "class ReentrantReadWriteLock" is not upgradeable, however, instead of upgrade you can unlock read, then lock write, and (very important): re-check the condition that lead to these lines again (e.g. if( current.getSortedWord().equals(sortedWord) ) {...}). This is important, because another thread may have changed things between read unlock and write lock.
for details check this question and its answers
In the end the traversal of the B-tree will run in parallel. Only when a target node is found, the thread acquires an exclusive lock (and other threads will block only for the time of the insertion).

Locking and unlocking is overhead, and the more you do it, the slower your program will be.
On the other hand, decomposing a task and running portions in parallel will make your program complete more quickly.
Where the "break-even" point lies is highly-dependent on the amount of contention for a particular lock in your program, and the system architecture on which the program is run. If there is little contention (as there appears to be in this program) and many processors, this might be a good approach. However, as the number of threads decreases, the overhead will dominate and a concurrent program will be slower. You have to profile your program on the target platform to determine this.
Another option to consider is a non-locking approach using immutable structures. Rather than modifying a list, for example, you could append the old (linked) list to a new node, then with a compareAndSet operation on an AtomicReference, ensure that you won the data race to set the words collection in current tree node. If not, try again. You could use AtomicReferences for the left and right children in your tree nodes too. Whether this is faster or not, again, would have to be tested on your target platform.

Considering one dataset per line, 48k lines isn't all that much and you can only have wild guesses as to how your operating system and the virtual machine are going to mangle you file IO to make it as fast as possible.
Trying to use a producer/consumer paradigm can be problematically here as you have to balance the overhead of locks vs. the actual amount of IO carefully. You might get better performance if you just try to improve the way you do the File IO (consider something like mmap()).

I would say that the doing it this way is not the way to go, without even taking the synchronization performance issues into account.
The fact that this implementation is slower than the original fully synchronized version may be a problem, but a bigger problem is that the locking in this implementation is not at all robust.
Imagine, for example, that you pass null in for sortedWord; this will result in a NullPointerException being thrown, which will mean you end up with holding onto the lock in the current thread, and therefore leaving your data structure in an inconsistent state. On the other hand, if you just synchronize this method, you don't have to worry about such things. Considering the synchronized version is faster as well, it's an easy choice to make.

You seem to have implemented a Binary Search Tree, not a B-Tree.
Anyway, have you considered using a ConcurrentSkipListMap? This is an ordered data structure (introduced in Java 6), which should have good concurrency.

I've got a dumb question: since you're reading and modifying a file, you're going to be totally limited by how fast the read/write head can move around and the disk can rotate. So what good is it to use threads and processors? The disc can't do two things at once.
Or is this all in RAM?
ADDED: OK, It's not clear to me how much parallelism can help you here (some, maybe), but regardless, what I would suggest is squeezing every cycle out of each thread that you can. This is what I'm talking about. For example, I wonder if innocent-looking sleeper code like those calls to "get" and "compare" methods are taking a more of a % of time than you might expect. If they are, you might be able to do each of them once rather than 2 or 3 times - that sort of thing.

BlockingQueue - blocked drainTo() methods

BlockingQueue has the method called drainTo() but it is not blocked. I need a queue that I want to block but also able to retrieve queued objects in a single method.
Object first = blockingQueue.take();
if ( blockingQueue.size() > 0 )
blockingQueue.drainTo( list );
I guess the above code will work but I'm looking for an elegant solution.

Are you referring to the comment in the JavaDoc:
Further, the behavior of this operation is undefined if the specified collection
is modified while the operation is in progress.
I believe that this refers to the collection list in your example:
blockingQueue.drainTo(list);
meaning that you cannot modify list at the same time you are draining from blockingQueue into list. However, the blocking queue internally synchronizes so that when drainTo is called, puts and (see note below) gets will block. If it did not do this, then it would not be truly Thread-safe. You can look at the source code and verify that drainTo is Thread-safe regarding the blocking queue itself.
Alternately, do you mean that when you call drainTo that you want it to block until at least one object has been added to the queue? In that case, you have little choice other than:
list.add(blockingQueue.take());
blockingQueue.drainTo(list);
to block until one or more items have been added, and then drain the entire queue into the collection list.
Note: As of Java 7, a separate lock is used for gets and puts. Put operations are now permitted during a drainTo (and a number of other take operations).

If you happen to use Google Guava, there's a nifty Queues.drain() method.
Drains the queue as BlockingQueue.drainTo(Collection, int), but if the
requested numElements elements are not available, it will wait for
them up to the specified timeout.

I found this pattern useful.
List<byte[]> blobs = new ArrayList<byte[]>();
if (queue.drainTo(blobs, batch) == 0) {
blobs.add(queue.take());
}

With the API available, I don't think you are going to get much more elegant. Other than you can remove the size test.
If you are wanting to atomically retrieve a contiguous sequence of elements even if another removal operation coincides, I don't believe even drainTo guarantees that.

Source code:
596: public int drainTo(Collection<? super E> c) {
//arg. check
603: lock.lock();
604: try {
608: for (n = 0 ; n != count ; n++) {
609: c.add(items[n]);
613: }
614: if (n > 0) {
618: notFull.signalAll();
619: }
620: return n;
621: } finally {
622: lock.unlock();
623: }
624: }
ArrayBlockingQueue is eager to return 0. BTW, it could do it before taking the lock.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.