Memcache getItemCount() counting expired keys? - java

I want to get the count of "alive" keys at any given time. Now according to the API documentation getItemCount() is meant to return this.
However it's not. Expired keys are not reducing the value of getItemCount(). Why is this? How can I accurately get a count of all "active" or "alive" keys that have not expired?
Here's my put code;
syncCache.put(uid, cachedUID, Expiration.byDeltaSeconds(3), SetPolicy.SET_ALWAYS);
Now that should expire keys after 3 seconds. It expires them but getItemCount() does not reflect the true count of keys.
UPDATE:
It seems memcache might not be be what I should be using so here's what I'm trying to do.
I wish to write a google-app engine server/app that works as a "users online" feature for a desktop application. The desktop application makes a http request to the app with a unique ID as a paramater. The app stores this UID along with a timestamp. This is done every 3 minutes.
Every 5 mninute any entries that have a timestamp outside of that 5 minute window are removed. Then you count how many entries you have and that's how many users are "online".
The expire feature seemed perfect as then I wouldn't even need to worry about timestamps or clearing expired entries.

It might be a problem in the documentation, the python one does not mention anything about alive ones. This is reproducible also in python.
See also this related post How does the lazy expiration mechanism in memcached operate?.

getItemCount() might return expired keys because it's the way memcache works and many other caches too.
Memcache can help you to do what you describe but not in the way you tried to do it. Consider completely opposite situation: you put online users in memcache and then appengine wipe them out from memcache because of lack of free memory. cache doesn't give you any guaranties you items will be stored any particular period. So memcache can let you decrease number of requests to datastore and latency.
One of the ways to do it:
Maintain sorted map (user-id, last-login-refreshed) entries which stored in datastore (or in memcache if you do not need it to be very precise). on every login/refresh you update value by particular key, and from periodic cron-job remove old users from the map. So size of map would be number of logged in use at particular moment.
Make sure map will fit in 1 Mb which is limit for both memcache or datastore.

Related

Java cache with expiration since first write

I have events which should be accumulated into persistent key-value store. After 24 hours after key first insert this accumulated record should be processed and remove from store.
Expired data processing is distributed among multiple nodes, so use of database involves processing synchronization problems. I don't want to use any SQL database.
The best fit for me is probably some cache with configurable expiration policy according to my needs. Is there any? Or can be this solved with some No-SQL database?
It should be possible with products like infinispan or hazelcast.
Both are JSR107 compatible.
With a JSR107 compatible cache API a possible approach is to set your 24h hours expiry via the CreatedExpiryPolicy. Next, you implement and register CacheEntryExpiredListener to get a call when the entry is expired.
The call on the CacheEntryExpiredListener may be lenient and implementation dependent. Actually the event is triggered on the "eviction due to expiry". For example, one implementation may do a peridoc scan and remove expired entries every 30 minutes. However I think that "lag time" is adjustible in most implementations, so you will be able to operate in defined bounds.
Also check whether there are some resource constraints for the event callbacks you may run into, like thread pools.
I mentioned infispan or hazelcast for two reasons:
You may need the distribution capabilities.
Since you do long running processing and store data that is not recoverable, you may need the persistence and fault tolerance features. So I would say a simple in memory cache like Google Guava is out of the scope.
Good luck!

Web Service Architecture: Redis (as cache) & PostgreSQL for persistence

I'm developing a Java REST API that uses client data from a postgreSQL database.
The numbers:
. About 600 clients at the beginning
. Some of them doing requests every few seconds
Because clients pay per request, we need to control if their number of successful requests reach their limit, and as querying postgresql data (update the value of 'hitsCounter' field) after every request is bad in terms of performance, we are thinking about implementing a cache system with redis.
The idea:
After a client does his first request, we retrieve his data from postgresql and store it into redis cache. Then work with this cache-data, for example incrementing the 'hitsCounter' key value, till the client stops doing requests.
In parallel, every few minutes a background process persist data from redis cache to db tables, so at the end we have the updated data back to postgresql, and we can deal with them in the future.
I think it obviously increase performance, but I'm not sure about this "background process". An option is to check the TTL of the cache elements and if it's minor than some value (it means client has finished doing requests), persist the data.
I would love to hear some opinions about this. Is this a good idea? Do you know some better alternatives?
Perfectly reasonable idea, but you've not mentioned any measurements you've made. What is the bottleneck in your target hardware with your target transaction levels? Without knowing that, you can't say.
You could use an unlogged table perhaps. Just insert a row with every query, then summarise every 5 minutes, clearing out old data. Then again, with HOT updates, and say 75% fill-factor maybe updates are more efficient. I don't know (and nor do you) we haven't measured it.
Not enough? Stick it on its own tablespace on ssd.
Not enough? Stick it on its own vm/machine.
Not enough? Just write the damn stuff to flat files on each front-end box and batch the data once a minute into the database.
Also - how much are they paying per query? Do you care if power fails and you lose five seconds of query logs? Do you need to be able to reproduce receipts for each query with originating details and a timestamp?

Store data in session, how and when to detect if data is stale

The scenario I have is this.
User does a search
Handler finds results, stores in session
User see results, decides to click one of them to view
After viewing, user clicks to "Back to Search"
Handler detects its a back to search, skips search and instead retrieves from session
User sees the same results as expected
At #5, if there was a new item created and fits the user's search criteria, thus it should be part of the results. But since in #5 I'm just retrieving from session it will not detect it.
My question is, should I be doing an extra step of checking? If so, how to check effectively without doing an actual retrieve (which would defeat the purpose)? Maybe do select count(*) .... and compare that with count of resultset in session?
Caching something search results in a session is something I strongly advise against. Web apps should strive to have the smallest session state possible. Putting in blanket logic to cache search results (presumably several kb at least) against user session state is really asking for memory problems down the road.
Instead, you should have a singleton search service which manages its own cache. Although this appears similar in strategy to caching inside the session, it has several advantages:
you can re-use common search results among users; depending on the types of searches this could be significant
you can manage cache size in the service layer; something like ehcache is easy to implement and gives you lots of configurability (and protection against out of memory issues)
you can manage cache validity in the service layer; i.e. if the "update item" service has had its save() method triggered, it can tell the search service to invalidate either its entire cache or just the cached results that correspond with the newly updated/created item.
The third point above addresses your main question.
It depends on your business needs. If it's imperative that the user have the latest up to date results then you'll have to repull them.
A count wouldn't be 100% because there could be corresponding deletions.
You might be able to compare timestamps or something but I suspect all the complexity involved would just introduce further issues.
Keep it simple and rerun your search.
In order to see if there are new items, you likely will have to rerun your search - even just to get a count.
You are effectively caching the search results. The normal answer is therefore either to expire the results after a set amount of time (eg. the results are only valid for 1 minute) or have a system that when the data is changed, the cache is invalidated, causing the search to have to run again.
Are there likely to be any new results by the time the user gets back there? You could just put a 'refresh' button on the search results pages to cause the search to be run again.
What kind of refresh rate are you expecting in the DB items? Would the search results change drastically even for short intervals, because I am not aware of such a scenario but you might have a different case.
Assuming that you have a scenario where your DB is populated by a separate thread or threads and you have another independent thread to search for results, keep track of the timestamp of the latest item inserted into the DB in your cache.
Now, when user wants to see search results again compare the timestamps i.e. compare your cache timestamp with that of the last item inserted into the DB. If there is no match then re-query else show from your cache.
If your scenario confirms to my assumption that the DB is not getting updated too frequently (w.r.t. to a specific search term or criteria) then this could save you from querying the DB too often.

Are IdGeneratorStrategy.Identity values reused after a jdo has been deleted

I'm using Google App Engine.
If a Long key field is generated by IdGeneratorStrategy.Identity and then the object is deleted from the datastore, is there any chance of the key being used again by a different object of the same class?
papercrane on reddit writes:
The documentation for
GenerationType.IDENTITY says that it
means the persistence provider (the
database) will provide the unique ID.
So it is entirely up to your database
software if it decides to reuse IDs
from deleted records. Without knowing
anything else about your problem I'd
say it is possible, but I can't think
of any good reason for a database
server to keep track of which IDs are
in use and recycle old ones. That
seems like a lot of overhead for very
little benefit.
And Mark Ross on Google Groups writes
on how GAE identities are generated:
Since the datastore in prod is
comprised of multiple back-ends, we
use a sharded counter approach to dole
out IDs so that we don't have to worry
about different back-ends handing out
the same id. So, back-end A may be
working from a pool of IDs ranging
from 0 to 100 and back-end B may be
working from a pool of IDs ranging
from 101 to 200, and so on. If your
inserts hit different datastore
back-ends you'll get IDs that jump
around a bit. You can depend on these
IDs being unique, but not
monotonically increasing.
I now think that it is very unlikely that Identity values are reused but it would still be good to have a clear definitive answer.
App Engine will never reuse IDs for a given kind and parent. In fact, I think you'll be hard pressed to find a database that does - keeping a simple counter is far, far simpler than trying to figure out which IDs are still in use, and with 64 bits, you're not going to run out of IDs.

Cache with fixed expiry time in Java

My Java web application (tomcat) gets all of its data from an SQL database. However, large parts of this database are only updated once a day via a batchjob. Since queries on these tables tend do be rather slow, I want to cache the results.
Before rolling my own solution, I wanted to check out existing cache solutions for java. Obviously, I searched stackoverflow and found references and recommendations for ehcache.
But looking through the documentation it seems it only allows for setting the lifetime of cached objects as a duration (e.g. expire 1 hour after added), while I want an expiry based on a fixed time (e.g. expire at 0h30 am).
Does anyone know a cache library that allows such expiry behaviour? Or how to do this with ehcache if that's possible?
EhCache allows you programmatically set the expiry duration on an individual cache element when you create it. The values configured in ehcache.xml are just defaults.
If you know the specific absolute time that the element should expire, then you can calculate the difference in seconds between then and "now" (i.e. the time you add to the cache), and set that as the time-to-live duration, using Element.setTimeToLive()
Do you need a full blown cache solution? You use standard Maps and then have a job scheduled to clear them at the required time.

Categories

Resources