Java in-memory cache with subcaches for each client

Java in-memory cache with subcaches for each client - java

I have a ConcurrentMap which is my in-memory cache/database for my web app.
There I have stored my entity with the id as key.
This is how my entity basically looks like.
public class MyEntity {
private int id;
private String name;
private Date start;
private Date end;
...
}
Now I have multiple Users which requests different data from my map.
User1 has a filter for the start date. So for example he only gets item 1, 2, and 3 of my map. User2 has also a filter and only gets item 2, 3, 4, 5 of the map.
So they only get a part of the complete map. I am doing the filtering on my server because the map is to big to send the complete map and I need to check other attributes.
My problem now is that the entries in the map can be updated / removed / added from some other API calls and I want to live update the entries on the user side.
For now I am sending a notification to the users that the map has been updated and then every user loads the complete data which the user needs.
For example item 8 has been updated. User1 gets a notification and loads item 1, 2, 3 again even if the update was only on item 8. So in this case it would be unnecessary to update for User1 because he doesnt need item 8.
Now i am searching for a good solution so that the User only receives the necessary updates.
One way I was thinking about was to store temporarily all items id which the user requested. So on an update notification I can check if the item updated is in the list and then send the updated item to the user only if it is in the users list.
But I am concerning the memory usage that this will create in case I have a lot of users and the user list with the item ids can be very big too.
What would be a good solution to send only the added / updated / removed item to the user and only if the user needs that item?
So something like observing only a part of the base map (cache) but with a notification for every action like adding, updating and removing item.

Basically there is no "Silver Bullet". A possible solution depends on the usage patterns, your resources and requirements.
Questions to ask:
How big is the total dataset, now and in the future?
How big is the displayed dataset?
How many users with searches?
Is it a business application with a closed user group or a public web application that needs to scale?
What is the update frequency and what kind of updates are common?
How many different search patterns are present?
The biggest question of all:
Is it really needed?
Looking at a bunch of search interfaces, the user expects an update only when there is an interaction. If it is similar to a search, then the users will not expect an instant update. An instant update can be useful and "innovative", but you do spend a lot of engineering costs into that problem.
So before jumping onto the engineering task, make sure that the benefit will justify the costs. Maybe check on these alternative approaches first:
Don't do it. Only update on a user interaction. Maybe add a reload button.
Only notify the user that an update occurred, but only update when the user hits "reload". This solves the problem that the user may not have the browser tab in focus and the transfer is a waste.
But I am concerning the memory usage that this will create in case I have a lot of users and the user list with the item ids can be very big too.
In our applications we observe that there are not so many different search patterns / filters. Let's say you might have 1000 user sessions, but only 20 different popular searches. If that is true, you can do:
For each search filter, you can store a hash value of the result.
If you do the update of the main data, you run the searches again and only send an update to the user if the hash value changed.
If it is a public web application, you should optimize more. E.g. don't send an update when the application has no focus.

Related

Is it a good idea to have unique keys to better aggregate data in MongoDB

Hello I am creating an app where people essentially join groups to do tasks, and each group has a unique name. I want to be able to update each of the users document that has to do with a specific group without having to for loop each user and update with each iteration.
I want to know if its a good idea to have a unique key like this in mongoDB.
{
...
"specific_group_name": (whatever data point here)
...
}
in each of the users document, so I can just call a simple
updateToMany(eq("specific_group_name", (whatever data point here)), Bson object)
To decrease the run time that is involved, just in case there is alot of users within the group.
Thank you

Just a point to note, instead of a specific group name, better make sure that it's specific groupId. Also pay special attention to cases when you have to remove group from the people, and also if there's cases when a person in a particular group shouldn't receive this update.
What you want to do is entirely valid though. If you put specific_group_name/id in the collection, then you're moving the selection logic to db. If you're doing a one-by-one update, then you have more flexibility on how to select users to update on Java/application side.
If selection is simple (a.k.a always update people in this group) then go ahead

Java Design Pattern on saving form data

I have a jsp page having many sections/categories to fill hardware configuration details and each section/category has many details to be filled in either by selecting a value in a list box or entering data in a text box. The user may fill in some fields of some section and can choose to fill other sections later. When user logs in next time to fill data, he must be shown the previously filled in data for respective sections/categories. The current design is, when user is entering any data and goes to next field, an ajax call is made to persist the entered data in DB. So if there are 10 fields in a section and if there are 10 sections in the form, 100 JDBC calls are made and if user wants to edit already entered field, additional JDBC calls are being made. Also the 10 fields in a section are dependent on each other, for example if the first field is “Operating System Name” and if I select as “Windows” then the next field “OS Version” should only show values “2000,2007 2008 etc” and the next field “OS Architecture” should only show relevant values for Windows and its Version. This was the main reason why a JDBC call is made each time when user enters a value in a field
Need your advice on this design to make minimal JDBC calls and the current design more efficient. Thanks

You could take a look at the J2EE ContextObject pattern. It'll allow you to encapsulate the state of user's configuration, and to share it throughout your application. Also the ValueListHandler will help you handle those expensive objects via caching the results, and allow the client to traverse and select items from the results.

Checking if a Set of items exist in database quickly

I have an external service which I'm grabbing a list of items from, and persisting locally a relationship between those items and a user. I feed that external service a name, and get back the associated items with that name. I am choosing to persist them locally because I'd like to keep my own attributes about those external items once they've been discovered by my application. The items themselves are pretty static objects, but the total number of them are unknown to me, and the only time I learn about new ones is if a new user has an association with them on the external service.
When I get a list of them back from the external service, I want to check if they exist in my database first, and use that object instead but if it doesn't I need to add them so I can set my own attributes and keep the association to my user.
Right now I have the following (pseudocode, since it's broken into service layers etc):
Set<ExternalItem> items = externalService.getItemsForUser(user.name);
for (ExternalItem externalItem : items){
Item dbItem = sessionFactory.getCurrentSession().get(Item.class,item.id);
if (dbitem == null){
//Not in database, create it.
dbItem = mapToItem(externalItem);
}
user.addItem(dbItem);
}
sessionFactory.getCurrentSession().save(user);//Saves the associated Items also.
The time this operation is taking is around 16 seconds for approximately 500 external items. The remote operation is around 1 second of that, and the save is negligible also. The drain that I'm noticing comes from the numerous session.get(Item.class,item.id) calls I'm doing.
Is there a better way to check for an existing Item in my database than this, given that I get a Set back from my external service?
Note: The external item's id is reliable to be the same as mine, and a single id will always represent the same External Item

I would definitely recommend a native query, as recommended in the comments.
I would not bother to chunk them, though, given the numbers you are talking about. Postgres should be able to handle an IN clause with 500 elements with no problems. I have had programmatically generated queries with many more items than that which performed fine.
This way you also have only one round trip, which, assuming the proper indexes are in place, really should complete in sub-second time.

Designing a count based access control

I would like to get some advice on designing a count based access control. For example I want to restrict the number of users that a customer can create in my system based on their account. So by default a customer can create 2 users but if the upgrade their account they get to create 5 users and so on.
There are a few more features that I need to restrict on a similar basis.
The application follows a generic model so every feature exposed has a backing table and we have a class which handles the CRUD operation on that table. Also the application runs on multiple nodes and has a distributed cache.
The approach that I am taking to implement this is as follows
- I have a new table which captures the functionality to control and the allowed limit (stored per customer).
- I intercept the create method for all tables and check if the table in question needs to have access control applied. If so I fetch the count of created entities and compare against the limit to decide if I should allow the creation or not.
- I am using the database to handle synchronization in case of concurrent requests. So after the create method is called I update the table using the following where clause
where ( count_column + 1 ) = #countInMemory#
. i.e. the update will succeed only if the value stored in the DB + 1 = value in memory. This will ensure that even if two threads attempt a create at the same time, only one of them will be able to successfully update. The thread that successfully updates wins and the other one is rolled back. This way I do not need to synchronize any code in the application.
I would like to know if there is any other / better way of doing this. My application runs on Oracle and MySQL DB.
Thanks for the help.

When you roll back, do you retry (after fetching the new user count) or do you fail? I recommend the former, assuming that the new fetched user count would permit another user.
I've dealt with a similar system recently, and a few things to consider: do you want CustomerA to be able to transfer their users to CustomerB? (This assumes that customers are not independent, for example in our system CustomerA might be an IT manager and CustomerB might be an accounting manager working for the same company, and when one of CustomerA's employees moves to accounting he wants this to be reflected by CustomerB's account.) What happens to a customer's users when the customer is deleted? (In our case another customer/manager would need to adopt them, or else they would be deleted.) How are you storing the customer's user limit - in a separate table (e.g. a customer has type "Level2," and the customer-type table says that "Level2" customers can create 5 users), or in the customer's row (which is more error prone, but would also allow a per-customer override on their max user count), or a combination (a customer has a type column that says they can have 5 users, and an override column that says they can have an additional 3 users)?
But that's beside the point. Your DB synchronization is fine.

Store data in session, how and when to detect if data is stale

The scenario I have is this.
User does a search
Handler finds results, stores in session
User see results, decides to click one of them to view
After viewing, user clicks to "Back to Search"
Handler detects its a back to search, skips search and instead retrieves from session
User sees the same results as expected
At #5, if there was a new item created and fits the user's search criteria, thus it should be part of the results. But since in #5 I'm just retrieving from session it will not detect it.
My question is, should I be doing an extra step of checking? If so, how to check effectively without doing an actual retrieve (which would defeat the purpose)? Maybe do select count(*) .... and compare that with count of resultset in session?

Caching something search results in a session is something I strongly advise against. Web apps should strive to have the smallest session state possible. Putting in blanket logic to cache search results (presumably several kb at least) against user session state is really asking for memory problems down the road.
Instead, you should have a singleton search service which manages its own cache. Although this appears similar in strategy to caching inside the session, it has several advantages:
you can re-use common search results among users; depending on the types of searches this could be significant
you can manage cache size in the service layer; something like ehcache is easy to implement and gives you lots of configurability (and protection against out of memory issues)
you can manage cache validity in the service layer; i.e. if the "update item" service has had its save() method triggered, it can tell the search service to invalidate either its entire cache or just the cached results that correspond with the newly updated/created item.
The third point above addresses your main question.

It depends on your business needs. If it's imperative that the user have the latest up to date results then you'll have to repull them.
A count wouldn't be 100% because there could be corresponding deletions.
You might be able to compare timestamps or something but I suspect all the complexity involved would just introduce further issues.
Keep it simple and rerun your search.

In order to see if there are new items, you likely will have to rerun your search - even just to get a count.
You are effectively caching the search results. The normal answer is therefore either to expire the results after a set amount of time (eg. the results are only valid for 1 minute) or have a system that when the data is changed, the cache is invalidated, causing the search to have to run again.
Are there likely to be any new results by the time the user gets back there? You could just put a 'refresh' button on the search results pages to cause the search to be run again.

What kind of refresh rate are you expecting in the DB items? Would the search results change drastically even for short intervals, because I am not aware of such a scenario but you might have a different case.
Assuming that you have a scenario where your DB is populated by a separate thread or threads and you have another independent thread to search for results, keep track of the timestamp of the latest item inserted into the DB in your cache.
Now, when user wants to see search results again compare the timestamps i.e. compare your cache timestamp with that of the last item inserted into the DB. If there is no match then re-query else show from your cache.
If your scenario confirms to my assumption that the DB is not getting updated too frequently (w.r.t. to a specific search term or criteria) then this could save you from querying the DB too often.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.