Guava CacheBuilder removal listener

Guava CacheBuilder removal listener - java

Please show me where I'm missing something.
I have a cache build by CacheBuilder inside a DataPool. DataPool is a singleton object whose instance various thread can get and act on. Right now I have a single thread which produces data and add this into the said cache.
To show the relevant part of the code:
private InputDataPool(){
cache=CacheBuilder.newBuilder().expireAfterWrite(1000, TimeUnit.NANOSECONDS).removalListener(
new RemovalListener(){
{
logger.debug("Removal Listener created");
}
public void onRemoval(RemovalNotification notification) {
System.out.println("Going to remove data from InputDataPool");
logger.info("Following data is being removed:"+notification.getKey());
if(notification.getCause()==RemovalCause.EXPIRED)
{
logger.fatal("This data expired:"+notification.getKey());
}else
{
logger.fatal("This data didn't expired but evacuated intentionally"+notification.getKey());
}
}}
).build(new CacheLoader(){
#Override
public Object load(Object key) throws Exception {
logger.info("Following data being loaded"+(Integer)key);
Integer uniqueId=(Integer)key;
return InputDataPool.getInstance().getAndRemoveDataFromPool(uniqueId);
}
});
}
public static InputDataPool getInstance(){
if(clsInputDataPool==null){
synchronized(InputDataPool.class){
if(clsInputDataPool==null)
{
clsInputDataPool=new InputDataPool();
}
}
}
return clsInputDataPool;
}
From the said thread the call being made is as simple as
while(true){
inputDataPool.insertDataIntoPool(inputDataPacket);
//call some logic which comes with inputDataPacket and sleep for 2 seconds.
}
and where inputDataPool.insertDataIntoPool is like
inputDataPool.insertDataIntoPool(InputDataPacket inputDataPacket){
cache.get(inputDataPacket.getId());
}
Now the question is, the element in cache is supposed to expire after 1000 nanosec.So when inputDataPool.insertDataIntoPool is called second time, the data which has been inserted first time will be evacuated as it must have got expired as the call is being after 2 seconds of its insertion.And then correspondingly Removal Listener should be called.
But this is not happening. I looked into cache stats and evictionCount is always zero, no matter how much time cache.get(id) is called.
But importantly, if I extend inputDataPool.insertDataIntoPool
inputDataPool.insertDataIntoPool(InputDataPacket inputDataPacket){
cache.get(inputDataPacket.getId());
try{
Thread.sleep(2000);
}catch(InterruptedException ex){ex.printStackTrace();
}
cache.get(inputDataPacket.getId())
}
then the eviction take place as expected with removal listener being called.
Now I'm very much clueless at the moment where I'm missing something to expect such kind of behaviour. Please help me see,if you see something.
P.S. Please ignore any typos.Also no check is being made, no generic has been used, all as this is just in the phase of testing the CacheBuilder functionality.
Thanks

As explained in the javadoc and in the user guide, There is no thread that makes sure entries are removed from the cache as soon as the delay has elapsed. Instead, entries are removed during write operations, and occasionally during read operations if writes are rare. This is to allow for a high throughput and a low latency. And of course, every write operation doesn't cause a cleanup:
Caches built with CacheBuilder do not perform cleanup and evict values
"automatically," or instantly after a value expires, or anything of
the sort. Instead, it performs small amounts of maintenance during
write operations, or during occasional read operations if writes are
rare.
The reason for this is as follows: if we wanted to perform Cache
maintenance continuously, we would need to create a thread, and its
operations would be competing with user operations for shared locks.
Additionally, some environments restrict the creation of threads,
which would make CacheBuilder unusable in that environment.

I had the same issue and I could find this at guava's documentation for CacheBuilder.removalListener
Warning: after invoking this method, do not continue to use this cache
builder reference; instead use the reference this method returns. At
runtime, these point to the same instance, but only the returned
reference has the correct generic type information so as to ensure
type safety. For best results, use the standard method-chaining idiom
illustrated in the class documentation above, configuring a builder
and building your cache in a single statement. Failure to heed this
advice can result in a ClassCastException being thrown by a cache
operation at some undefined point in the future.
So by changing your code to use the builder reference that is called after adding the removalListnener this problem can be resolved
CacheBuilder builder=CacheBuilder.newBuilder().expireAfterWrite(1000, TimeUnit.NANOSECONDS).removalListener(
new RemovalListener(){
{
logger.debug("Removal Listener created");
}
public void onRemoval(RemovalNotification notification) {
System.out.println("Going to remove data from InputDataPool");
logger.info("Following data is being removed:"+notification.getKey());
if(notification.getCause()==RemovalCause.EXPIRED)
{
logger.fatal("This data expired:"+notification.getKey());
}else
{
logger.fatal("This data didn't expired but evacuated intentionally"+notification.getKey());
}
}}
);
cache=builder.build(new CacheLoader(){
#Override
public Object load(Object key) throws Exception {
logger.info("Following data being loaded"+(Integer)key);
Integer uniqueId=(Integer)key;
return InputDataPool.getInstance().getAndRemoveDataFromPool(uniqueId);
}
});
This problem will be resolved. It is kind of wired but I guess it is what it is :)

Related

Ideas on concurrent datastructure

I am not sure if i can put my question in the clearest fashion but i will try my best.
Lets say i am retrieving some information from a third party api. The retrieved information will be huge in size. To have a performance gain, instead of retrieving all the info in one go, i will be retrieving the info in a paged fashion (the api gives me that facility, basically an iterator). The return type is basically a list of objects.
My aim here is to process the information i have in hand(that includes comparing and storing in db and many other operations) while i get paged response on the request.
My question here to the expert community is , what data structure do you prefer in such case. Also does a framework like spring batch help you in getting performance gains in such cases.
I know the question is a bit vague, but i am looking for general ideas,tips and pointers.

In these cases, the data structure for me is java.util.concurrent.CompletionService.
For purposes of example, I'm going to assume a couple of additional constraints:
You want only one outstanding request to the remote server at a time
You want to process the results in order.
Here goes:
// a class that knows how to update the DB given a page of results
class DatabaseUpdater implements Callable { ... }
// a background thread to do the work
final CompletionService<Object> exec = new ExecutorCompletionService(
Executors.newSingleThreadExecutor());
// first call
List<Object> results = ThirdPartyAPI.getPage( ... );
// Start loading those results to DB on background thread
exec.submit(new DatabaseUpdater(results));
while( you need to ) {
// Another call to remote service
List<Object> results = ThirdPartyAPI.getPage( ... );
// wait for existing work to complete
exec.take();
// send more work to background thread
exec.submit(new DatabaseUpdater(results));
}
// wait for the last task to complete
exec.take();
This just a simple two-thread design. The first thread is responsible for getting data from the remote service and the second is responsible for writing to the database.
Any exceptions thrown by DatabaseUpdater will be propagated to the main thread when the result is taken (via exec.take()).
Good luck.

In terms of doing the actual parallelism, one very useful construct in Java is the ThreadPoolExecutor. A rough sketch of what that might look like is this:
public class YourApp {
class Processor implements Runnable {
Widget toProcess;
public Processor(Widget toProcess) {
this.toProcess = toProcess;
}
public void run() {
// commit the Widget to the DB, etc
}
}
public static void main(String[] args) {
ThreadPoolExecutor executor =
new ThreadPoolExecutor(1, 10, 30,
TimeUnit.SECONDS,
new LinkedBlockingDeque());
while(thereAreStillWidgets()) {
ArrayList<Widget> widgets = doExpensiveDatabaseCall();
for(Widget widget : widgets) {
Processor procesor = new Processor(widget);
executor.execute(processor);
}
}
}
}
But as I said in a comment: calls to an external API are expensive. It's very likely that the best strategy is to pull all the Widget objects down from the API in one call, and then process them in parallel once you've got them. Doing more API calls gives you the overhead of sending the data all the way from the server to you, every time -- it's probably best to pay that cost the fewest number of times that you can.
Also, keep in mind that if you're doing DB operations, it's possible that your DB doesn't allow for parallel writes, so you might get a slowdown there.

Guava CacheBuilder doesn't call removal listener

I want: Be notified when entity is removed due timeout expiration.
I tried: Set removal listener.
Problem: Seems removal listener doesn't work correct. It work only when I put new items into the cache (see code below)
Question: How to make removal listener work without putting new items?
Code:
My loading cache:
LoadingCache<String, Integer> ints = CacheBuilder.newBuilder()
.maximumSize(10000)
.expireAfterAccess(ACCESS_TIMEOUT, TimeUnit.MILLISECONDS)
.removalListener(
new RemovalListener() {
//PROBLEM: THIS METHOD IS NEVER CALLED!!!
public void onRemoval(RemovalNotification notification) {
if (notification.getCause() == RemovalCause.EXPIRED) {
System.out.println("Value " + notification.getValue() + " has been expired");
} else {
System.out.println("Just removed for some reason");
}
}
}
)
.build(
new CacheLoader<String, Integer>() {
public Integer load(String key) throws Exception {
return new Integer(-1);
}
});
how I use cache in separate thread:
cache.put("key", 100);
Thread.sleep(ACCESS_TIMEOUT / 2);
System.out.println(cache.getIfPresent(key)); //returns 100
Thread.sleep(ACCESS_TIMEOUT * 5);
//CRUCIAL STRING: cache.put("key2", 200); //invoke removal listener
System.out.println(cache.getIfPresent(key)); //return null
//And again: the problem is that entity has been already expired, but removal listener isn't called untill I add new item to the cache.
P.S: I can share the complete demo at GitHub if you need, just tell me

It's because Guava does not ensure the eviction of the values automatically when the timeout value expires. It however does that during a series of read and write operations.
Per its documentation here:
Caches built with CacheBuilder do not perform cleanup and evict values
"automatically," or instantly after a value expires, or anything of
the sort. Instead, it performs small amounts of maintenance during
write operations, or during occasional read operations if writes are
rare.
The reason for this is as follows: if we wanted to perform Cache
maintenance continuously, we would need to create a thread, and its
operations would be competing with user operations for shared locks.
Additionally, some environments restrict the creation of threads,
which would make CacheBuilder unusable in that environment.
To validate your onRemoval on expiration, call cache#cleanUp right before your 2nd read operation and it is supposed to call your onRemoval.

Deal with concurrent modification on List without having ConcurrentModificationException

I have a stateful EJB which calls an EJB stateless method of Web parsing pages.
Here is my stateful code :
#Override
public void parse() {
while(true) {
if(false == _activeMode) {
break;
}
for(String url : _urls){
if(false == _activeMode) {
break;
}
for(String prioritaryUrl : _prioritaryUrls) {
if(false == _activeMode)
break;
boursoramaStateless.parseUrl(prioritaryUrl);
}
boursoramaStateless.parseUrl(url);
}
}
}
No problem here.
I have some asynchronously call (with JMS) that add to my _urls variable (a List) some value. Goal is to parse new url inside my infinity loop.
I receive ConcurrentModificationException when I try to add new url in my List via JMS onMessage method but it seems to be working because this new url is parsed.
When I try to wrap a synchronized block :
while(true){
synchronized(_url){
// code...
}
}
My new url is never parsed, I expected to be parsed after a for() loop finished...
So my question is : how can I modify List when it's accessed inside a loop without having ConcurrentModificationException please ?
I just want 2 threads to modify some shared resource at same time without synchronized block...

You may want a CopyOnWriteArrayList.

For (String s : urls) uses an Iterator internally. The iterator checks for concurrent modification so that its behavior is well defined.
You can use a for(int i= ... loop. This way, no exception is thrown, and if elements are only added to the end of the List, you still get a consistent snapshot (the list as it exists at some time during the iteration). If the elements in the list are moved around, you may get missing entries.
If you want to use synchronised, you need to synchronise on both ends, but that way you lose concurrent reads.
If you want concurrent access AND consistent snapshots, you can use any of the collections in the java.util.concurrent package.
CopyOnWriteArrayList has already been mentioned. The other interesting are LinkedBlockingQueue and ArrayBlockingQueue (Collections but not Lists) but that's about all.

ok thank you guys.
So I made some modifications.
1) added iterator and leaving synchronized block (inside parse() function and around addUrl() function which add new url to my List)
--> it's work like a charm, no ConcurrentModificationException launched
2) added iterator and removed synchronized blocks
--> ConcurrentModificationException is still launched...
For now, I will read more about your answers and test your solutions.
Thank you again guys

First, forget about synchronized when running into Java EE container. It bothers the container to optimize threads utilization and will not work in clustered environment.
Second, it seems that your design is wrong. You should not update private field of the bean using JMS. This thing causes ConcurrentModificationException. You probably should modify your bean to retrieve the collection from database and your MDB to store the URL into the Database.
Other, easier for you solution is the following.
Retrieve the currently existing URLs and copy them to other collection. Then iterate over this collection. When the global collection is updated via JMS the update is not visible in the copied collection, so no exceptions will be thrown:
while(true) {
for (String url : copyUrls(_prioritaryUrls)) {
// deal with url
}
}
private List<String> copyUrls(List<Stirng> urls) {
return new ArrayList<String>(urls); // this create copy of the source list
}
//........
public void onMessage(Message message) {
_prioritaryUrls.add(((TextMessage)message).getText());
}

Does RemovalListener callback in guava caching api make sure that no one is using the object

I read the code sample/documentation about caching in the wiki page. I see that callback RemovalListener can be used to do tear down etc of evicted cached objects. My question is does the library make sure that the object is not being used by any other thread before calling the provided RemovalListener. Lets consider the code example from the docs:
CacheLoader<Key, DatabaseConnection> loader =
new CacheLoader<Key, DatabaseConnection> () {
public DatabaseConnection load(Key key) throws Exception {
return openConnection(key);
}
};
RemovalListener<Key, DatabaseConnection> removalListener =
new RemovalListener<Key, DatabaseConnection>() {
public void onRemoval(RemovalNotification<Key, DatabaseConnection> removal) {
DatabaseConnection conn = removal.getValue();
conn.close(); // tear down properly
}
};
return CacheBuilder.newBuilder()
.expireAfterWrite(2, TimeUnit.MINUTES)
.removalListener(removalListener)
.build(loader);
Here the cache is configured to evict elements 2 minutes after creation (I understand that it may not be exact two minutes because eviction would be piggybacked along with user read/write calls etc.) But whatever time be it, will the library check that there is no active reference present to the object being passed to the RemovalListener? Because I may have another thread who fetched the object from the cache long back but may be still using it. In that case I cannot call close() on it from RemovalListener.
Also the documentation of RemovalNotification says that: A notification of the removal of a single entry. The key and/or value may be null if they were already garbage collected.
So according to it conn could be null in the above example. How do we tear down the conn object properly in such case? Also the above code example in such case will throw NullPointerException.
The use case I am trying to address is:
The cache element need to expire after two minutes of creation.
The evicted object needs to be closed, but only afte making sure no one is using them.

Guava contributor here.
My question is does the library make sure that the object is not being used by any other thread before calling the provided RemovalListener.
No, that would be impossible for Guava to do generally -- and a bad idea anyway! If the cache values were Integers, then because Integer.valueOf reuses Integer objects for integers below 128, you could never expire an entry with a value below 128. That would be bad.
Also the documentation of RemovalNotification says that: A notification of the removal of a single entry. The key and/or value may be null if they were already garbage collected. So according to it conn could be null in the above example.
To be clear, that's only possible if you're using weakKeys, weakValues, or softValues. (And, as you've correctly deduced, you can't really use any of those if you need to do some teardown on the value.) If you're only using some other form of expiration, you'll never get a null key or value.
In general, I don't think a GC-based solution is going to work here. You must have a strong reference to the connection to close it properly. (Overriding finalize() might work here, but that's really a broken thing generally.)
Instead, my approach would be to cache references to a wrapper of some sort. Something like
class ConnectionWrapper {
private Connection connection;
private int users = 0;
private boolean expiredFromCache = false;
public Connection acquire() { users++; return connection; }
public void release() {
users--;
if (users == 0 && expiredFromCache) {
// The cache expired this connection.
// We're the only ones still holding on to it.
}
}
synchronized void tearDown() {
connection.tearDown();
connection = null; // disable myself
}
}
and then use a Cache<Key, ConnectionWrapper> with a RemovalListener that looks like...
new RemovalListener<Key, ConnectionWrapper>() {
public void onRemoval(RemovalNotification<Key, ConnectionWrapper> notification) {
ConnectionWrapper wrapper = notification.getValue();
if (wrapper.users == 0) {
// do the teardown ourselves; nobody's using it
wrapper.tearDown();
} else {
// it's still in use; mark it as expired from the cache
wrapper.expiredFromCache = true;
}
}
}
...and then force users to use acquire() and release() appropriately.
There's really not going to be any way better than this approach, I think. The only way to detect that there are no other references to the connection is to use GC and weak references, but you can't tear down a connection without a strong reference to it -- which destroys the whole point. You can't guarantee whether it's the RemovalListener or the connection user who'll need to tear down the connection, because what if the user takes more than two minutes to do its thing? I think this is probably the only feasible approach left.
(Warning: the above code assumes only one thread will be doing things at a time; it's not synchronized at all, but hopefully if you need it, then this is enough to give you an idea of how it should work.)

How to make an async listener do blocking?

I am writing a blackberry app that communicates with a simple Bluetooth peripheral using text based AT commands - similar to a modem... I can only get it working on the blackberry using an event listener. So the communication is now asynchronous.
However, since it is a simple device and I need to control concurrent access, I would prefer to just have a blocking call.
I have the following code which tries to convert the communications to blocking by using a wait/notify. But when I run it, notifyResults never runs until getStringValue completes. i.e. it will always timeout no matter what the delay.
The btCon object runs on a separate thread already.
I'm sure I am missing something obvious with threading. Could someone kindly point it out?
Thanks
I should also add the the notifyAll blows up with an IllegalMonitorStateException.
I previously tried it with a simple boolean flag and a wait loop. But the same problem existed. notifyResult never runs until after getStringValue completes.
public class BTCommand implements ResultListener{
String cmd;
private BluetoothClient btCon;
private String result;
public BTCommand (String cmd){
this.cmd=cmd;
btCon = BluetoothClient.getInstance();
btCon.addListener(this);
System.out.println("[BTCL] BTCommand init");
}
public String getStringValue(){
result = "TIMEOUT";
btCon.sendCommand(cmd);
System.out.println("[BTCL] BTCommand getStringValue sent and waiting");
synchronized (result){
try {
result.wait(5000);
} catch (InterruptedException e) {
System.out.println("[BTCL] BTCommand getStringValue interrupted");
}
}//sync
System.out.println("[BTCL] BTCommand getStringValue result="+result);
return result;
}
public void notifyResults(String cmd) {
if(cmd.equalsIgnoreCase(this.cmd)){
synchronized(result){
result = btCon.getHash(cmd);
System.out.println("[BTCL] BTCommand resultReady: "+cmd+"="+result);
result.notifyAll();
}//sync
}
}
}

Since both notifyResults and getStringValue have synchronized clauses on the same object, assuming getStringValues gets to the synchronized section first notifyResults will block at the start of the synchronized clause until getStringValues exits the synchronized area. If I understand, this is the behaviour you're seeing.
Nicholas' advice is probably good, but you may not find any of those implementations in BlackBerry APIs you're using. You may want to have a look at the produce-consumer pattern.

It may be more appropriate to use a Latch, Semaphore, or a Barrier, as recommended by Brian Goetz book Java Concurrency in Practice.
These classes will make it easier to write blocking methods, and will likely help to prevent bugs, especially if you are unfamiliar with wait() and notifyAll(). (I am not suggesting that YOU are unfamiliar, it is just a note for others...)

The code will work ok. If you will use final object instead of string variable. I'm surprised that you don't get NPE or IMSE.
Create field:
private final Object resultLock = new Object();
Change all synchronized sections to use it instead of string field result.
I don't like magic number 5 sec. I hope you treat null result as timeout in your application.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Guava CacheBuilder removal listener - java

Related

Ideas on concurrent datastructure

Guava CacheBuilder doesn't call removal listener

Deal with concurrent modification on List without having ConcurrentModificationException

Does RemovalListener callback in guava caching api make sure that no one is using the object

How to make an async listener do blocking?

Categories

Resources