LRU Cache Implementation in Java

LRU Cache Implementation in Java - java

I have seen the following code, and I think that there is a useless while loop in the implementation of addElement method. It should never happen to have more elements than size+1 since there is already a write lock.
So why is the addElement method removing elements till it gets this condition
true
while(concurrentLinkedQueue.size() >=maxSize)
Any pointers around this would be great.
Here is the Implementation:
public class LRUCache<K,V> {
private ConcurrentLinkedQueue<K> concurrentLinkedQueue = new ConcurrentLinkedQueue<K>();
private ConcurrentHashMap<K,V> concurrentHashMap = new ConcurrentHashMap<K, V>();
private ReadWriteLock readWriteLock = new ReentrantReadWriteLock();
private Lock readLock = readWriteLock.readLock();
private Lock writeLock = readWriteLock.writeLock();
int maxSize=0;
public LRUCache(final int MAX_SIZE){
this.maxSize=MAX_SIZE;
}
public V getElement(K key){
readLock.lock();
try {
V v=null;
if(concurrentHashMap.contains(key)){
concurrentLinkedQueue.remove(key);
v= concurrentHashMap.get(key);
concurrentLinkedQueue.add(key);
}
return v;
}finally{
readLock.unlock();
}
}
public V removeElement(K key){
writeLock.lock();
try {
V v=null;
if(concurrentHashMap.contains(key)){
v=concurrentHashMap.remove(key);
concurrentLinkedQueue.remove(key);
}
return v;
} finally {
writeLock.unlock();
}
}
public V addElement(K key,V value){
writeLock.lock();
try {
if(concurrentHashMap.contains(key)){
concurrentLinkedQueue.remove(key);
}
while(concurrentLinkedQueue.size() >=maxSize){
K queueKey=concurrentLinkedQueue.poll();
concurrentHashMap.remove(queueKey);
}
concurrentLinkedQueue.add(key);
concurrentHashMap.put(key, value);
return value;
} finally{
writeLock.unlock();
}
}
}

the point here is, i guess, that you need to check if the LRU is at it's maximum size. the check here is NOT (map.size() > maxSize), it is ">=". now, you could probably replace that with "if (map.size() == maxSize) {...}" - which, in ideal conditions, should do exactly the same thing.
but in not-so-ideal conditions, if for whatever reason, somebody put an EXTRA entry in the map without checking, then with this code, the map would NEVER go down in size again, because the if condition would never be true.
so - why not "while" and ">=" instead of "if" and "=="? same amount of code, plus more robust against "unexpected" conditions.

An easy implementation of an LRU cache does the following, a while loop is only need when the max size is adjusted, but not for the primitive operations:
During put, remove superflous element.
During get, move element to top.
The primitive operations will be one shot. You can then use either ordinary synchronized or read write lock around this data structure.
When using read write locks the fairness on who comes first is then rather an issue of the used read write locks than of the LRU cache itself.
Here is a sample implementation.

It's not wrong but just a safety in case of accidental modification. You could check for equality with concurrentLinkedQueue.size() == maxSize in a conditional statement.

Related

Java - Compare and Swap and synchronized Block

public class SimulatedCAS {
private int value;
public synchronized int get() { return value; }
public synchronized int compareAndSwap(int expectedValue, int newValue)
{
int oldValue = value;
if (oldValue == expectedValue)
value = newValue;
return oldValue;
}
}
public class CasCounter
{
private SimulatedCAS value;
public int getValue()
{
return value.get();
}
public int increment()
{
int value.get();
while (v != value.compareAndSwap(v, v + 1))
{
v = value.get();
}
}
}
I refereed a Book "Java Concurrency in Practice"
a Counter must be increased by multiple threads. I tried using the compare and swap method but at the end it make used of synchronized keyword which might again result in blocking and waiting of threads. using a synchronized block provides me the same performance can anybody state. what is the difference between using compare and swap and synchronized block ? or any other way to implement compare and swap without using synchronized block.

I need to increment counter with multiple threads
The AtomicInteger class is good for that.
You can create it with final AtomicInteger i=new AtomicInteger(initial_value); Then you can call i.set(new_value) to set its value, and you can call i.get() to get its value, and most importantly for your application, you can call i.incrementAndGet() to atomically increment the value.
If N different threads all call i.incrementAndGet() at "the same time," then
Each thread is guaranteed to see a different return value, and
The final value after they're all done is guaranteed to increase by exactly N.
The AtomicInteger class has quite a few other methods as well. Most of them make useful guarantees about what happens when multiple threads access the varaible.

Real Compare and Swap does optimistic locking. It changes value and then makes a rollback if something has changed the variable simultaneously. So, if the variable is modified rarely, then CAS performs better, than synchronized.
But if the variable is modified often, then synchronized performs better, because it doesn't allow anything to mess with the variable while it is changed. And so there's no need to make an expensive rollback.

Volatile and ArrayBlockingQueue and perhaps other concurrent objects

I understand (or at least I think I do;) ) the principle behind volatile keyword.
When looking into ConcurrentHashMap source, you can see that all nodes and values are declared volatile, which makes sense because the value can be written/read from more than one thread:
static class Node<K,V> implements Map.Entry<K,V> {
final int hash;
final K key;
volatile V val;
volatile Node<K,V> next;
...
}
However, looking into ArrayBlockingQueue source, it's a plain array that is being updated/read from multiple threads:
private void enqueue(E x) {
// assert lock.getHoldCount() == 1;
// assert items[putIndex] == null;
final Object[] items = this.items;
items[putIndex] = x;
if (++putIndex == items.length)
putIndex = 0;
count++;
notEmpty.signal();
}
How is it guaranteed that the value inserted into items[putIndex] will be visible from another thread, providing that the element inside the array is not volatile (i know that declaring the array itself doesnt have any effect anyhow on the elements themselves) ?
Couldn't another thread hold a cached copy of the array?
Thanks

Notice that enqueue is private. Look for all calls to it (offer(E), offer(E, long, TimeUnit), put(E)). Notice that every one of those looks like:
public void put(E e) throws InterruptedException {
checkNotNull(e);
final ReentrantLock lock = this.lock;
lock.lockInterruptibly();
try {
// Do stuff.
enqueue(e);
} finally {
lock.unlock();
}
}
So you can conclude that every call to enqueue is protected by a lock.lock() ... lock.unlock() so you don't need volatile because lock.lock/unlock are also a memory barrier.

According to my understanding volatile is not needed as all BlockingQueue implementations already have a locking mechanism unlike the ConcurrentHashMap.
If you look at he public methods of the Queue you will find a ReentrantLock that guards for concurrent access.

Java Synchronization - Mutex.wait vs List.wait

While using Java Threading Primitives to construct a thread safe bounded queue - whats the difference between these 2 constructs
Creating an explicit lock object.
Using the list as the lock and waiting on it.
Example of 1
private final Object lock = new Object();
private ArrayList<String> list = new ArrayList<String>();
public String dequeue() {
synchronized (lock) {
while (list.size() == 0) {
lock.wait();
}
String value = list.remove(0);
lock.notifyAll();
return value;
}
}
public void enqueue(String value) {
synchronized (lock) {
while (list.size() == maxSize) {
lock.wait();
}
list.add(value);
lock.notifyAll();
}
}
Example of 2
private ArrayList<String> list = new ArrayList<String>();
public String dequeue() {
synchronized (list) { // lock on list
while (list.size() == 0) {
list.wait(); // wait on list
}
String value = list.remove(0);
list.notifyAll();
return value;
}
}
public void enqueue(String value) {
synchronized (list) { // lock on list
while (list.size() == maxSize) {
list.wait(); // wait on list
}
list.add(value);
list.notifyAll();
}
}
Note
This is a bounded list
No other operation is being performed apart from enqueue and dequeue.
I could use a blocking queue, but this question is more for improving my limited knowledge of threading.
If this question is repeated please let me know.

The short answer is, no, there is no functional difference, other than the extra memory overhead of maintaining that extra lock object. However, there are a couple of semantics-related items I would consider before making a final decision.
Will I ever need to perform synchronized operations on more than just my internal list?
Let's say you wanted to maintain a parallel data structure to your ArrayList, such that all operations on the list and that parallel data structure needed to be synchronized. In this case, it might be best to use the external lock, as locking on either the list or the structure might be confusing to future development efforts on this class.
Will I ever give access to my list outside of my queue class?
Let's say you wanted to provide an accessor method for your list, or make it visible to extensions of your Queue class. If you were using an external lock object, classes that retrieved references to the list would never be able to perform thread-safe operations on that list. In that case, it'd be better to synchronize on the list and make it clear in the API that external accesses/modifications to the list must also synchronize on that list.
I'm sure there are more reasons why you might choose one over the other, but these are the two big ones I can think of.

Java synchronization on Collection with expensive operations

I have a list that I synchronize on named synchronizedMap in my function doMapOperation. In this function, I need to add/remove items from a map and perform expensive operations on these objects. I know that I don't want to call an expensive operation in a synchronized block, but I don't know how to make sure that the map is in a consistent state while I do these operations. What is the right way to do this?
This is my initial layout which I am sure is wrong because you want to avoid calling an expensive operation in a synchronized block:
public void doMapOperation(Object key1, Object key2) {
synchronized (synchronizedMap) {
// Remove key1 if it exists.
if (synchronizedMap.containsKey(key1)) {
Object value = synchronizedMap.get(key1);
value.doExpensiveOperation(); // Shouldn't be in synchronized block.
synchronizedMap.remove(key1);
}
// Add key2 if necessary.
Object value = synchronizedMap.get(key2);
if (value == null) {
Object value = new Object();
synchronizedMap.put(key2, value);
}
value.doOtherExpensiveOperation(); // Shouldn't be in synchronized block.
} // End of synchronization.
}
I guess as a continuation of this question, how would you do this in a loop?
public void doMapOperation(Object... keys) {
synchronized (synchronizedMap) {
// Loop through keys and remove them.
for (Object key : keys) {
// Check if map has key, remove if key exists, add if key doesn't.
if (synchronizedMap.containsKey(key)) {
Object value = synchronizedMap.get(key);
value.doExpensiveOperation(); // Shouldn't be here.
synchronizedMap.remove(key);
} else {
Object value = new Object();
value.doAnotherExpensiveOperation(); // Shouldn't here.
synchronizedMap.put(key, value);
}
}
} // End of synchronization block.
}
Thanks for the help.

You can do the expensive operations outside your synchronized block like so:
public void doMapOperation(Object... keys) {
ArrayList<Object> contained = new ArrayList<Object>();
ArrayList<Object> missing = new ArrayList<Object>();
synchronized (synchronizedMap) {
if (synchronizedMap.containsKey(key)) {
contained.add(synchronizedMap.get(key));
synchronizedMap.remove(key);
} else {
missing.add(synchronizedMap.get(key));
synchronizedMap.put(key, value);
}
}
for (Object o : contained)
o.doExpensiveOperation();
for (Object o : missing)
o.doAnotherExpensiveOperation();
}
The only disadvantage is you may be performing operations on values after they are removed from the synchronizedMap.

You can create a wrapper for your synchronizedMap and make sure the operations like containsKey, remove, and put are synchronized methods. Then only access to the map will be synchronized, while your expensive operations can take place outside the synchronized block.
Another advantage is by keeping your expensive operations outside the synchronized block you avoid a possible deadlock risk if the operations call another synchronized map method.

In the first snippet: Declare the two values out of the if-clause, and just assign them in the if-clause. Make the if-clause synchronized, and invoke the expensive operations outside.
In the 2nd case do the same, but inside the loop. (synchronized inside the loop). You can, of course, have only one synchronized statement, outside the loop, and simply fill a List of objects on which to invoke the expensive operation. Then, in a 2nd loop, outside the synchronized block, invoke that operations on all values in the list.

We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil. Yet we should not pass
up our opportunities in that critical 3%. A good programmer will not
be lulled into complacency by such reasoning, he will be wise to look
carefully at the critical code; but only after that code has been
identified. — Donald Knuth
You have a single method, doMapOperation(). What is your performance if this method continues to be block-synchronized? If you don't know then how will you know when you've got a good performing solution? Are you prepared to handle multiple calls to your expensive operations even after they have been removed from the map?
I'm not trying to be condescending, since maybe you understand the problem at hand better than you've conveyed, but it seems like you're jumping into a level of optimization for which you may not be prepared and may not be necessary.

You can actually do it all with only one synchronization hit. The first remove is probably the easiest. If you know the object exists, and you know remove is atomic, why not just remove it and if what is returned is not null invoke the expensive operations?
// Remove key1 if it exists.
if (synchronizedMap.containsKey(key1)) {
Object value = synchronizedMap.remove(key1);
if(value != null){ //thread has exclusive access to value
value.doExpensiveOperation();
}
}
For the put, since it is expensive and should be atomic you are pretty much out of luck and need to synchronize access. I would recommend using some kind of a computing map. Take a look at google-collections and MapMaker
You can create a ConcurrentMap that will build the expensive object based on your key for example
ConcurrentMap<Key, ExpensiveObject> expensiveObjects = new MapMaker()
.concurrencyLevel(32)
.makeComputingMap(
new Function<Key, ExpensiveObject>() {
public ExpensiveObject apply(Key key) {
return createNewExpensiveObject(key);
}
});
This is simlpy a form of memoization
In both of these cases, you don't need to use synchronized at all (at least explicitly)

If you don't have null values in the Map, you don't need the containsKey() call at all: you can use Map.remove() to both remove the item and tell you whether it was there. So the true content of your synchronized block only needs to be this:
Object value = Map.remove(key);
if (value != null)
value.doExpensiveOperation();
else
{
value = new Value();
value.doExpensiveOperation();
map.put(key,value);
}
If the expensive operation itself doesn't need to be synchronized, i.e. if you don't mind other clients of the Map seeing the value while it is being operated on, you can further simplify to this:
Object value = Map.remove(key);
if (value == null)
{
value = new Value();
map.put(key,value);
}
value.doExpensiveOperation();
and the synchronized block can terminate before the expensive operation.

When to use synchronized

I'm wondering what is the reason behind synchronizing the below code. I don't think deadlock could occur ?
private final Object lock = new Object();
private Hashtable content = new Hashtable();
public void deleteContent(Object key){
synchronized(lock){
if(content.containsKey(key)){
content.remove(key);
}
}
}
public Object getContent(Object key){
synchronized(lock){
return (Object) content.get(key);
}
}

I have no idea.
The implementation of Hashtable is already synchronized and the remove method does nothing if the key isn't in the table. So all synchronized blocks can be removed (also the containsKey check).
Maybe the lock is used elsewhere in the code and is there for a reason. (?)

There is a race condition between containsKey() and remove(). A lock avoid the race condition.
However its rather pointless becasue you can just call remove() alone.

You are correct -- if they were to synchronize it, they should do synchronized(content), which is what all Hashtable methods are synchronized on.
Also that cast to (Object) shows whoever wrote this has only read the cover of a Java book.
This is just as good:
private Hashtable content = new Hashtable();
public void deleteContent(Object key){
content.remove(key);
}
public Object getContent(Object key){
return content.get(key);
}

If that hashtable is being accessed by concurrently by different methods, retrieval or deleting of elements would have to be synchronized to prevent concurrent modification!

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

LRU Cache Implementation in Java - java

It's not wrong but just a safety in case of accidental modification. You could check for equality with concurrentLinkedQueue.size() == maxSize in a conditional statement.

Related

Java - Compare and Swap and synchronized Block

Volatile and ArrayBlockingQueue and perhaps other concurrent objects

Java Synchronization - Mutex.wait vs List.wait

Java synchronization on Collection with expensive operations

When to use synchronized

Categories

Resources