ConcurrentModificationException when updating stored Iterator (for LRU cache implementation)

ConcurrentModificationException when updating stored Iterator (for LRU cache implementation) - java

I am trying to implement my own LRU cache. Yes, I know that Java provides a LinkedHashMap for this purpose, but I am trying to implement it using basic data structures.
From reading about this topic, I understand that I need a HashMap for O(1) lookup of a key and a linked list for management of the "least recently used" eviction policy. I found these references that all use a standard library hashmap but implement their own linked list:
"What data structures are commonly used for LRU caches and quickly
locating objects?" (stackoverflow.com)
"What is the best way to Implement a LRU Cache?" (quora.com)
"Implement a LRU Cache in C++" (uml.edu)
"LRU Cache (Java)" (programcreek.com)
The hash table is supposed to directly store a linked list Node as I show below. My cache should store Integer keys and String values.
However, in Java the LinkedList collection does not expose its internal nodes, so I can't store them inside the HashMap. I could instead have the HashMap store indices into the LinkedList, but then getting to an item would require O(N) time. So I tried to store a ListIterator instead.
import java.util.Map;
import java.util.HashMap;
import java.util.List;
import java.util.LinkedList;
import java.util.ListIterator;
public class LRUCache {
private static final int DEFAULT_MAX_CAPACITY = 10;
protected Map<Integer, ListIterator> _map = new HashMap<Integer, ListIterator>();
protected LinkedList<String> _list = new LinkedList<String>();
protected int _size = 0;
protected int _maxCapacity = 0;
public LRUCache(int maxCapacity) {
_maxCapacity = maxCapacity;
}
// Put the key, value pair into the LRU cache.
// The value is placed at the head of the linked list.
public void put(int key, String value) {
// Check to see if the key is already in the cache.
ListIterator iter = _map.get(key);
if (iter != null) {
// Key already exists, so remove it from the list.
iter.remove(); // Problem 1: ConcurrentModificationException!
}
// Add the new value to the front of the list.
_list.addFirst(value);
_map.put(key, _list.listIterator(0));
_size++;
// Check if we have exceeded the capacity.
if (_size > _maxCapacity) {
// Remove the least recently used item from the tail of the list.
_list.removeLast();
}
}
// Get the value associated with the key.
// Move value to the head of the linked list.
public String get(int key) {
String result = null;
ListIterator iter = _map.get(key);
if (iter != null) {
//result = iter
// Problem 2: HOW DO I GET THE STRING FROM THE ITERATOR?
}
return result;
}
public static void main(String argv[]) throws Exception {
LRUCache lruCache = new LRUCache(10);
lruCache.put(10, "This");
lruCache.put(20, "is");
lruCache.put(30, "a");
lruCache.put(40, "test");
lruCache.put(30, "some"); // Causes ConcurrentModificationException
}
}
So this leads to three problems:
Problem 1: I am getting a ConcurrentModificationException when I update the LinkedList using the iterator that I store in the HashMap.
Exception in thread "main" java.util.ConcurrentModificationException
at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
at java.util.LinkedList$ListItr.remove(LinkedList.java:919)
at LRUCache.put(LRUCache.java:31)
at LRUCache.main(LRUCache.java:71)
Problem 2. How do I retrieve the value pointed to by the ListIterator? It seems I can only retrieve the next() value.
Problem 3. Is there any way to implement this LRU cache using the Java collections LinkedList, or do I really have to implement my own linked list?

1) This isn't really what Iterators are for.
By contract, if you modify the list without using the iterator -- as you do here
_list.addFirst(value);
then ALL OPEN ITERATORS on that list should throw ConcurrentModificationException. They were open to a version of the list that no longer exists.
2) A LinkedList is not, exactly, a linked list of nodes. It's a java.util.List, whose backing implementation is a doubly linked list of nodes. That List contract is why it does not expose references to the backing implementation -- so operations like "remove this node, as a node, and move it to the head" are no good. This encapsulation is for your own protection (same as the concurrent mod exception) -- it allows your code to rely on the List semantics of a LinkedList (iterability, for example) without worry that some joker two cubes away was hacking away at its innards and broke the contract.
3) What you really need here is NOT a LinkedList. What you need is a Stack that that allows you to move any arbitrary entry to the head and dump the tail. You are implying that you want a fast seek time to an arbitrary entry and also a fast remove and a fast add, AND you want to be able to find the tail at any moment in case you need to remove it.
Fast seek time == HashSomething
Fast add/remove of arbitrary elements == LinkedSomething
Fast addressing of the final element == SomekindaList
4) You're going to need to build your own linking structure...or use a LinkedHashMap.
PS LinkedHashSet is cheating, it's implemented using a LinkedHashMap.

I'll deal with problem 3 first:
As you point out in your question, LinkedList (like all well designed generic collections) hides the details of the implementation such as the nodes containing the links. In your case you need your hash map to reference these links directly as the values of the map. To do otherwise (e.g. having indirection through a third class) would defeat the purpose of an LRU cache to allow very low overhead on value access. But this is not possible with standard Java Collections - they don't (and shouldn't) provide direct access to internal structures.
So the logical conclusion of this is that, yes, you need to implement your own way of storing the order in which items in the cache have been used. That doesn't have to be a double-linked list. Those have traditionally been used for LRU caches because the most common operation is to move a node to the top of the list when it is accessed. That is an incredibly cheap operation in a double-linked list requiring just four nodes to be relinked with no memory allocation or free.
Problem 1 & 2:
Essentially the root cause here is that this you are trying to use iterators as a cursor. They are designed to be created, stepped through to perform some operation and then disposed of. Even if you get over the problems you are having I expect there will be further problems behind them. You're putting a square peg in a round hole.
So my conclusion is that you need to implement your own way to hold values in a class that keeps track of order of access. However it can be incredibly simple: only three operations are required: create, get value and remove from tail. Both create and get value must move the node to the head of the list. No inserting or deleting from the middle of the list. No deleting the head. No searching. Honestly dead simple.
Hopefully this will get you started :-)
public class <K,V> LRU_Map implements Map<K,V> {
private class Node {
private final V value;
private Node previous = null;
private Node next = null;
public Node(V value) {
this.value = value;
touch();
if (tail == null)
tail = this;
}
public V getValue() {
touch();
return value;
}
private void touch() {
if (head != this) {
unlink();
moveToHead();
}
}
private void unlink() {
if (tail == this)
tail = prev;
if (prev != null)
prev.next = next;
if (next != null)
next.prev = prev;
}
private void moveToHead() {
prev = null;
next = head;
head = this;
}
public void remove() {
assert this == tail;
assert this != head;
assert next == null;
if (prev != null)
prev.next = null;
tail = prev;
}
}
private final Map<K,Node> map = new HashMap<>();
private Node head = null;
private Node tail = null;
public void put(K key, V value) {
if (map.size() >= MAX_SIZE) {
assert tail != null;
tail.remove();
}
map.put(key, new Node(value));
}
public V get(K key) {
if (map.containsKey(key))
return map.get(key).getValue();
else
return null;
}
// and so on for other Map methods
}

Another way to skin this cat would be to implement a very simple class that extends the LinkedList, but runs any modifications to the list (e.g. add, remove, etc) inside of a "synchronized" block. You'll need to run your HashMap pseudo-pointer through the get() every time, but it should work just fine. e.g.
...
private Object lock = new Object(); //semaphore
//override LinkedList's implementations...
#Override
public <T> remove(int index) { synchronized(lock) { return super.remove(index); } }
...
If you have Eclipse or IntelliJ IDEA, then you should be able to auto-generate the method stubs you need almost instantly, and you can evaluate which ones need to be locked.

Related

Is there a hashmap implementation that uses a caching scheme?

I swear that in the past I had seen something about a hashmap implementation using some type of caching but when I was reading up today on how hashmaps are implemented in java, it was just simply a table with linked lists. Let me go deeper in what I mean.
From what I read today, hashmap in java is essentially like so
There exists an array simply called "table" where each index of the array is the hashcode. The value of the array is the first element of the linked list for that hashcode.
When you try to retrieve an object from the hashmap using the key, the key is transformed into a hashcode which is applied as the index of the "table" to then go to the linkedlist to iterate and find the correct object corresponding to the key.
But what I had read before was something different than this. What I had read was when you retrieve an object from the hashmap using the key, the corresponding bucket is cached so that when you retrieve another object from the hashmap from the same bucket, you are using the cache. But when you retrieve an object from a different bucket, the other bucket is cached instead.
Did I just completely misunderstand something in the past and invent something in my head, or is there something like this that might have confused me?

Never heard of that.
First of all: the "table" array has only a certain size so the hascode is not used directly. From the HashMap source tab[i = (n - 1) & hash]
Maybee you are mixing it up with the LinkedHashMap which keeps track of the element accesses:
void afterNodeAccess(Node<K,V> e) { // move node to last
LinkedHashMap.Entry<K,V> last;
if (accessOrder && (last = tail) != e) {
LinkedHashMap.Entry<K,V> p =
(LinkedHashMap.Entry<K,V>)e, b = p.before, a = p.after;
p.after = null;
if (b == null)
head = a;
else
b.after = a;
if (a != null)
a.before = b;
else
last = b;
if (last == null)
head = p;
else {
p.before = last;
last.after = p;
}
tail = p;
++modCount;
}
}
This behavior is usefull if you are implementing a LRU (Least recently used) cache. A LRU cache removes the elements that have not been requested for the longest period, once the cache reached its maximum size.

Reversing a linked list in Java without changing the original

I am trying to reverse a linked list which works, but when I try to print the original, it fails (only prints the head). My question is, why would reversing affect the original. Below is my code. LinkedList is my own class, so is Node. Before reversing, if I try to print my list, that works.
public static void main(String[] args) {
LinkedList list;
...
Node head = list.getHead();
Node rev = reverse(head);
Node temp = rev;
while (temp != null) {
System.out.println(temp);
temp = temp.next;
}
temp = head;
while (temp != null) {
System.out.println(temp);
temp = temp.next;
}
}
private static reverse(Node head) {
// Reversing the linked list
}
EDIT::
This seems to be a Java thing. Java passes an object by reference. When I pass head as a parameter, it's passed by reference and any change made to it is reflected in the calling function.
Doing Node h = head and then passing h as a parameter won't work either since h will be the same object as head.
The only option I can think of is to create a new object, copy the linked list over and pass that as a parameter.
My question becomes, is there a better solution?

To understand it, picture your list looks like this
list (list.head =a) --> a (a.next=b) --> b (b.next= c) -> c (c.next = null)
If you get the head, then you are getting object 'a'.
Then you are modifying object 'a'.
So you can see you are editing the list by doing this.
What you need to do is:
Get the head
Create a copy
Reverse the copy
Get the next item
Copy it
Reverse it
Join it to the last
And so on
Since you are using your own classes not java collections classes, the easiest way is for you to make sure that reverseNode() only edits a copy of the one you pass it, and returns the copy.
First make sure your Node class has a constructor that copies another Node, then do something like this:
private static Node reverse(Node original)
{
Node retval = new Node(original);
// or you could use clone () as Bhavik suggested, if your Node class implements it
// modify retval
// I haven't shown code to reverse it as I assume you already have that and you didnt specify if it was a bidirectional list or just one direction.
return retval;
}
Or you might add a static method in your Node class that constructs a new node that is reversed:
static Node createReverse(Node n)
{
return new Node(n.data,n.next,n.prior);
}
Or a non static method of the node class which returns a reversed copy of itself;
Node createReverse()
{
return new Node(this.data,this.next,this.prior);
}
But you should consider this can get very ugly because your copies will still have pointers pointing into the existing list!
A better technique might be to create a new empty list, and then start from the end of your original, make a copy, and add it to the start of your new list.
You can use recursion to do this but might easily run out of memory.
But rather than do this manually, you might look at the java.util packages and switch to using one of their LinkedList and list item types. These classes have already solved all the problems with doing this stuff.
Then you could (if you need to keep the original list unmodified):
- Make a copy of your entire list.
- reverse the copy as below
If you don't care about keeping the original, then just use this method(below) on your list, no need then to make a copy.
From java.util.Collections:
Collections.reverse(List a_list);
The Collections class will choose an efficient way to reverse the list, depending on whether it is bidirectional, single directional, etc.

Chunko's answer is right in that you should be creating copies of your Nodes instead of just passing references to them.
However, I think you might be overthinking this:
LinkedList reversedList = new LinkedList();
for (int i = originalList.size()-1; i >= 0; i--) {
reversedList.add(list.get(i));
}
Java's LinkedList doesn't have a getHead() method, so I guess you're using some homework-related custom implementation of a linked list. Still, your implementation should have an add() method that appends a new Node to the list, as well as a get(int) method that returns the item in a given position.
You can use those methods to create a new list and populate it with the original list's items in reversed order.

duplicating a node in a linked list java

just to start off, this is homework and thank you for your assistance ahead of time. I keep getting stuck on little problems so I am hoping you guys can help me with one. What I am trying to do is create a linked list that has multiples functions. The one I am having trouble with is sorting(I can do the other ones). Each node holds a string, an integer and a double. I need to be able to sort by each of these and by the order it was inputted, on the user's request. ***It is also important to mention that the variables in my object are private and my object is called list1. Basically, I have to make one linked list for the chronological order and one for each other order.
My plan is to insert the nodes in their correct order as the user inputs them. So as the user inputs a node, that node needs to go in the correct place in the chronological list and in the other lists. So, I need to copy the node to do this. However, I cannot simply just say
icopy(copy for integer) = newNode(node the user just inputted)
That only changes the address. When I went to my instructor he told me that I should say:
icopy.data = newNode.data;
("data" being the shortcut way of mentioning that I need to get the individual data types within the node.) So I wrote:
icopy.GetI() = newNode.GetI();
When I do this I encounter this error: unexpected type required:variable, found:value. I am not sure what to do. Any assistance would be appreciated and I would be happy to clarify anything.
*GetI: method in my object that gives access to the integer value in each node.
*p: pointer for the Chronological
*pi: pointer for the integer.
*fi: front of the integer linked list
public static void main(String args[])
{
String repeat = "y";
boolean inserted = false;
list1 fChr = null;
list1 p = fChr;
list1 icopy = null;
list1 scopy = null;
list1 dcopy = null;
list1 fd = fChr;//front of the double list
list1 fi = null;//front of the integer list
list1 fStr = fChr;//front of the string list~
while(repeat.equals("y"))//while the user agrees to adding a new node
{
if(fChr == null)// if the front is empty
{
fChr = new list1();//create a new node by calling object and sets it as the front
}
else
{
p = fChr;
while(p.next != null)//finds the end of the Linked list
{
p = p.next;//moves the pointer p down the list
}
list1 newNode = new list1();
icopy.GetI() = newNode.GetI();// make a copy of newNode
p.next = nexNode;//put in chronological order
while(p != null)
{
if(fi == null)
{
fi = n;
}
else if(n.GetI() < fi.GetI)//check at beginning
{
//put at beginning
}
else if(icopy.GetI() < p.next.GetI())//check in between nodes
{
//put in between
}
//does it go at the end
}
}
repeat = JOptionPane.showInputDialog("Would you like to add a node [y/n]");
}
PrintMenu(fChr, fi, fd, fStr);// sends the user to the menu screen
}

There are a few things here that you are not understanding. Firstly, in Java iCopy.getI() = ... makes no sense. When a method returns a value it needs to be assigned to a variable if you wish to change it. If you want to change the instance variable you need a separate method called something like iCopy.setI().
It sounds as though you're not asking for help with the sorting so I'll restrict my answer to creating copies of the list.
What your professor is getting at is that the easiest way to ensure the data is consistent in your several linked lists is to separate the class storing the data from the nodes of the list. So I would expect your class structure to end up looking something like:
class Data {
private final int intValue;
private final String strValue;
private final double doubleValue;
}
class Node {
private final Data data;
private Node next;
public Node(Data data) {
this.data = data;
this.next = null;
}
}
Now if you want to create a new linked list with the same data as the old one then you can add a constructor to Node that creates a reference to the original data:
class Node {
public Node copy() {
Node copy = new Node(data);
if (next != null)
copy.next = next.copy();
return copy;
}
}
Hopefully you can see what that does: it creates a new node referencing the same data as this one and then uses recursion to copy the rest of the list.
Now creating each of the sort orders could look like:
Node listByInt = list.copy();
/* code to sort listByInt according to data.intValue */
Add a comment if you want some hints on sorting as well but I suggest you get your code to the point of having equal copies of lists before attempting that.
As a final note, you don't necessarily need to have separate linked lists to solve this problem. An alternative would be to store the original insertion order in the node. You could then sort by any order (including original insertion order) before printing the list. Personally I'd prefer that as a solution unless there are performance issues (e.g. you need to use each sorted list many times).

Is this an efficient way to remove duplicates from a linked list?

I am writing a function that will take in the head of a linked list, remove all duplicates, and return the new head. I've tested it but I want to see if you can catch any bugs or improve on it.
removeDuplicates(Node head)
if(head == null) throw new RuntimeException("Invalid linked list");
Node cur = head.next;
while(cur != null) {
if(head.data == cur.data) {
head = head.next;
} else {
Node runner = head;
while(runner.next != cur) {
if(runner.next.data == cur.data) {
runner.next = runner.next.next;
break;
}
runner = runner.next;
}
cur = cur.next;
}
return head;
}

If you are willing to spend a little more RAM on the process, you can make it go much faster without changing the structure.
For desktop apps, I normally favor using more RAM and winning some speed. So I would do something like this.
removeDuplicates(Node head) {
if (head == null) {
throw new RuntimeException("Invalid List");
}
Node current = head;
Node prev = null;
Set<T> data = new HashSet<T>(); // where T is the type of your data and assuming it implements the necessary methods to be added to a Set properly.
while (current != null) {
if (!data.contains(current.data)) {
data.add(current.data);
prev = current;
current = current.next;
} else {
if (prev != null) {
prev.next = current.next;
current = current.next;
}
}
}
}
This should run in O(n) time.
EDIT
I hope I was correct in assuming that this is some kind of project / homework where you are being forced to use a linked list, otherwise, as noted, you would be better off using a different data structure.

I didn't check your code for bugs, but I do have a suggestion for improving it. Allocate a Hashtable or HashMap that maps Node to Boolean. As you process each element, if it is not a key in the hash, add it (with Boolean.TRUE as the value). If it does exist as a key, then it already appeared in the list and you can simply remove it.
This is faster than your method because hash lookups work in roughly constant time, while you have an inner loop that has to go down the entire remainder of the list for each list element.
Also, you might consider whether using an equals() test instead of == makes better sense for your application.

To efficiently remove duplicates you should stay away from linked list: Use java.util.PriorityQueue instead; it is a sorted collection for which you can define the sorting-criteria. If you always insert into a sorted collection removing duplicates can be either done directly upon insertion or on-demand with a single O(n)-pass.

Aside from using the elements of the list to create a hash map and testing each element by using it as a key, which would only be desirable for a large number of elements, where large depends on the resources required to create the hash map, sequentially scanning the list is a practical option, but there are others which will be faster. See user138170's answer here - an in-place merge sort is an O(n log(n)) operation which does not use an extra space, whereas a solution using separately-allocated array would work in O(n) time. Practically, you may want to profile the code and settle for a reasonable value of n, where n is the number of elements in the list, after which a solution allocating memory will be used instead of one which does not.
Edit: If efficiency is really important, I would suggest not using a linked list (largely to preserve cache-coherency) and perhaps using JNI and implementing the function natively; the data would also have to be supplied in a natively-allocated buffer.

Performance considerations for keySet() and entrySet() of Map

All,
Can anyone please let me know exactly what are the performance issues between the 2? The site : CodeRanch provides a brief overview of the internal calls that would be needed when using keySet() and get(). But it would be great if anyone can provide exact details about the flow when keySet() and get() methods are used. This would help me understand the performance issues better.

The most common case where using entrySet is preferable over keySet is when you are iterating through all of the key/value pairs in a Map.
This is more efficient:
for (Map.Entry entry : map.entrySet()) {
Object key = entry.getKey();
Object value = entry.getValue();
}
than:
for (Object key : map.keySet()) {
Object value = map.get(key);
}
Because in the second case, for every key in the keySet the map.get() method is called, which - in the case of a HashMap - requires that the hashCode() and equals() methods of the key object be evaluated in order to find the associated value*. In the first case that extra work is eliminated.
Edit: This is even worse if you consider a TreeMap, where a call to get is O(log(n)), i.e. the comparator may need to run log2(n) times (n = size of the Map) before finding the associated value.
*Some Map implementations have internal optimisations that check the objects' identity before the hashCode() and equals() are called.

First of all, this depends entirely on which type of Map you're using. But since the JavaRanch thread talks about HashMap, I'll assume that that's the implementation you're referring to. And let's assume also that you're talking about the standard API implementation from Sun/Oracle.
Secondly, if you're concerned about performance when iterating through your hash map, I suggest you have a look at LinkedHashMap. From the docs:
Iteration over the collection-views of a LinkedHashMap requires time proportional to the size of the map, regardless of its capacity. Iteration over a HashMap is likely to be more expensive, requiring time proportional to its capacity.
HashMap.entrySet()
The source-code for this implementation is available. The implementation basically just returns a new HashMap.EntrySet. A class which looks like this:
private final class EntrySet extends AbstractSet<Map.Entry<K,V>> {
public Iterator<Map.Entry<K,V>> iterator() {
return newEntryIterator(); // returns a HashIterator...
}
// ...
}
and a HashIterator looks like
private abstract class HashIterator<E> implements Iterator<E> {
Entry<K,V> next; // next entry to return
int expectedModCount; // For fast-fail
int index; // current slot
Entry<K,V> current; // current entry
HashIterator() {
expectedModCount = modCount;
if (size > 0) { // advance to first entry
Entry[] t = table;
while (index < t.length && (next = t[index++]) == null);
}
}
final Entry<K,V> nextEntry() {
if (modCount != expectedModCount)
throw new ConcurrentModificationException();
Entry<K,V> e = next;
if (e == null)
throw new NoSuchElementException();
if ((next = e.next) == null) {
Entry[] t = table;
while (index < t.length && (next = t[index++]) == null);
}
current = e;
return e;
}
// ...
}
So there you have it... That's the code dictating what will happen when you iterate through an entrySet. It walks through the entire array, which is as long as the map's capacity.
HashMap.keySet() and .get()
Here you first need to get hold of the set of keys. This takes time proportional to the capacity of the map (as opposed to size for the LinkedHashMap). After this is done, you call get() once for each key. Sure, in the average case, with a good hashCode-implementation this takes constant time. However, it will inevitably require lots of hashCode() and equals() calls, which will obviously take more time than just doing a entry.value() call.

Here is the link to an article comparing the performance of entrySet(), keySet() and values(), and advice regarding when to use each approach.
Apparently the use of keySet() is faster (besides being more convenient) than entrySet() as long as you don't need to Map.get() the values.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.