How to cache data if there is no unique identifier? - java

I have a spring-boot application. I have a method that gives away some content. Let's say bank customers. Let's say that bank customers are not added to the database very often. How can I cache this method so that the method follows a new list to the database only when a new customer is added there?
Sounds trivial enough, but I can't think of a mechanism.
That is, we have a method of say
GET /customers
The cache is tuned to the unique parameters in the request. Since this method has no unique parameters, the data will always be returned from the cache, even if the list of clients is added one extra.
For example, you can think of some method with a boolean response. It would return true when the list is updated. But this looks like a nasty solution. And it also means that instead of one request you have to do two at once. And if you also have authorization, authentication - that's three requests. That sounds bad.
Can you give me a hint?

Where do you will store the cache? Some of them, like Redis, you can set the expiry of the cached data. So it will refresh the data for every time that you have set.
Or you can store something like 'version' in the database, and also add to cache. So everytime you add new customer, you also update the version. Then it can be used to compare the version in db and cache are same or not, if not then get new list to database and re-add to cache. But this way need to call db every time you hit the GET /customers.

Related

Check if query result has changed, in order to not re-query for everything?

I am developing a java application that loads certain things from a database, such as client records and product info. When the user navigates to say the 'products' tab, I query for products in the database and update a table with that information.
I am wondering if there is a way to see if the query results have changed since the last check, in order to avoid querying and loading all info from the database, and instead just load updates. Is there a way to do this, or perhaps just load changes only from a query into my table list? My goal is to make the program run faster when switching between tabs.
I am wondering if there is a way to see if the query results have changed since the last check
Stated differently, you want a way to automatically answer the question “is this the same result?” without retrieving the entire result.
The general approach to this problem would be to come up with some fast-to-query proxy for the entire state of the result set, and query that instead.
Once you have determined a stable fast computation for the entire result set, you can compute that any time the relevant data changes; and only poll that stored proxy to see whether the data has changed.
For example, you could say that “the SHA-256 hash of fields lorem, ipsum, and dolor” is your proxy. You can now:
Implement that computation inside the database as a function, maybe products_hash.
Create a latest_products_hash table, that stores created timestamp and products_hash that was computed at that time.
In your application, retrieve the most recent record from latest_products_hash and keep it for reference.
In the database, have a scheduled job, or a trigger on some event you decide makes sense, that will compute and store the products_hash in latest_products_hash automatically without any action from the application.
To determine whether there have been updates yet, the application will query the latest_products_hash table again and compare its most recent record with the one the application stored for reference.
Only if the latest_products_hash most-recent value is different, then query the products table and get the full result set.
That way, the application is polling a much faster query (the most-recent record in latest_products_hash) frequently, and avoiding the full products query until it knows the result set will be new.

How to keep a java list in memory synced with a table in database?

I want to perform a search of a inputs in a list. That list resides in a database. I see two options for doing that-
Hit the db for each search and return the result.
keep a copy in memory synced with table and search in memory and return the result.
I like the second option as it will be faster. However I am confused on how to keep the list in sync with table.
example : I have a list L = [12,11,14,42,56]
and I receive an input : 14
I need to return the result if the input does exists in the list or not. The list can be updated by other applications. I need to keep the list in sync with table.
What would be the most optimized approach here and how to keep the list in sync with database?
Is there any way my application can be informed of the changes in the table so that I can reload the list on demand.
Instead of recreating your own implementation of something that already exists, I would leverage Hibernate's Second Level Cache (2LC) with an implementation such as EhCache.
By using a 2LC, you can specify the time-to-live expiration time for your entities and once they expire, any query would reload them from the database. If the entity cache has not yet expired, Hibernate will hydrate them from the 2LC application cache rather than the database.
If you are using Spring, you might also want to take a look at #Cachable. This operates at the component / bean tier allowing Spring to cache a result-set into a named region. See their documentation for more details.
To satisfied your requirement, you should control the read and write in one place, otherwise, there will always be some unsync case for the data.

Checking if a Set of items exist in database quickly

I have an external service which I'm grabbing a list of items from, and persisting locally a relationship between those items and a user. I feed that external service a name, and get back the associated items with that name. I am choosing to persist them locally because I'd like to keep my own attributes about those external items once they've been discovered by my application. The items themselves are pretty static objects, but the total number of them are unknown to me, and the only time I learn about new ones is if a new user has an association with them on the external service.
When I get a list of them back from the external service, I want to check if they exist in my database first, and use that object instead but if it doesn't I need to add them so I can set my own attributes and keep the association to my user.
Right now I have the following (pseudocode, since it's broken into service layers etc):
Set<ExternalItem> items = externalService.getItemsForUser(user.name);
for (ExternalItem externalItem : items){
Item dbItem = sessionFactory.getCurrentSession().get(Item.class,item.id);
if (dbitem == null){
//Not in database, create it.
dbItem = mapToItem(externalItem);
}
user.addItem(dbItem);
}
sessionFactory.getCurrentSession().save(user);//Saves the associated Items also.
The time this operation is taking is around 16 seconds for approximately 500 external items. The remote operation is around 1 second of that, and the save is negligible also. The drain that I'm noticing comes from the numerous session.get(Item.class,item.id) calls I'm doing.
Is there a better way to check for an existing Item in my database than this, given that I get a Set back from my external service?
Note: The external item's id is reliable to be the same as mine, and a single id will always represent the same External Item
I would definitely recommend a native query, as recommended in the comments.
I would not bother to chunk them, though, given the numbers you are talking about. Postgres should be able to handle an IN clause with 500 elements with no problems. I have had programmatically generated queries with many more items than that which performed fine.
This way you also have only one round trip, which, assuming the proper indexes are in place, really should complete in sub-second time.

Caching in an enterprise web application

I'm relatively new to using caching in larger programs intended to be used by a large number of people. I know what caching is and why its beneficial in general and I've started to integrate EHCache into my application which uses JSP and Spring MVC. In my application the user selects an ID from a drop down list and this uses a java class to grab data from DB according to the ID picked. First the query is executed and it returns a ResultSet object. At this point I am confused at what to do and feel like I'm missing something.
I know I want the object to go into cache if it's not already in there and if it's already in cache then just continue with the loop. But doing things this way requires me to iterate over the whole returned result set from the DB query, which is obviously not the way things are supposed to be done?
So, would you recommend that I just try to cache the whole result set returned? If I did this I guess I could update the list in the cache if the DB table is updated with a new record? Any suggestions on how to proceed and correctly put into ecache what is returned from the DB?
I know I'm throwing out a lot of questions and I certainly appreciate it if someone could offer some help! Here is a snippet of my code so you see what I mean.
rs = sta.executeQuery(QUERYBRANCHES + specifier);
while (rs.next())
{
//for each set of fields retrieved, use those to create a branch object.
//String brName = rs.getString("NAME");
String compareID = rs.getString("ID");
String fixedRegID = rs.getString("REGIONID").replace("0", "").trim();
//CHECKING IF BRANCH IS ALREADY IN THE CACHE. IF IT IS NOT CREATE
//THE NEW OBJECT AND ADD IT TO CACHE. IF THE BRANCH IS IN CACHE THEN CONTINUE
if(!cacheManager.isInMemory(compareID))
{
Branch branch =
new Branch(fixedRegID, rs.getString("ID"), rs.getString("NAME"), rs.getString("ADDR1"), rs.getString("CITY"), rs.getString("ST"), rs.getString("ZIP"));
cacheManager.addBranch(rs.getString("ID"), branch);
}
else
{
continue;
}
}
retData = cacheManager.getAllBranches();
But doing things this way requires me to iterate over the whole
returned result set from the DB query, which is obviously not the way
things are supposed to be done?
You need to iterate in order to fetch the results.
To avoid iteration on all elements you need to exclude the already cached values that are returned in the select.
What I mean is, add to your select exclusion clause the values you dont want, in this case the values already cached. (not like, <>, etc). This will reduce the iteration time.
Otherwise yes, Im afraid you will have to iterate over all returns if your SQL filter is not complete.
So, would you recommend that I just try to cache the whole result set
returned? If I did this I guess I could update the list in the cache
if the DB table is updated with a new record? Any suggestions on how
to proceed and correctly put into ecache what is returned from the DB?
You should not cache highly dynamic business information.
What I recommend is that you use database indexes so that would dramatically increase your performance, and get your values from there. Use pure native SQL if needed.
If you are going to work with a lot of users you will need a lot of memory to keep all those objects in memory.
As you start scaling horizontally caching management is going to be a challenge this way.
If you can, only cache values that wont change or that change very few times, like in the application start up or application parameters.
If you really need to cache business information, please let us know the specifics, like what is the hardware, platform, database, landscape, peak of access, etc.

Syncronizing 2 databases using hibernate - use save(), update() or saveOrUpdate()?

I am trying to sync multiple databases whose items have GUID for IDs, meaning that one item has the same ID on all databases.
My question is:
If i modify or create on item on 1 database, and want to synchronize this change to the other database should i:
1.) Check if the item is new or just modified, if its new then use the save() function, if its modified then use the update() function
or
2.)Do not check if its new or modified and just use the saveOrUpdate() function?
After seeing your use case in the comments, I think the best approach is to track (on both the client and server) when the last updated/last synced time was. In the event that the last sync time is null, or comes before the last updated time, you know that the data needs to be synced.
Now, on to the heart of your question: how to sync it. The client need not know the state of a server when it sends an object to you. In fact, it shouldn't. Consider the case where the client posts an object, your server receives it and process it, but the connection dies before your client receives the response. This is a very valid scenario and will result in a mis-match of data. As a result, any way that you try to determine whether or not the server has received an object (from the client) is likely to end up in a bad state.
The best solution is really to create an idempotent endpoint on the server (an upsert method, or saveOrUpdate as you referred to it in your question) which is able to determine what to do with the object. The server can query it's database by primary key to determine if it has the object or not. If it does, it can update, if not, it can insert.
Understandably, performance is important as well as the data. But, stick with primary keys in the database and that one additional select query you add should be extremely minimal (sub-10ms). If you really want to squeeze some more performance out, you could always use memcache or redis as a caching layer to determine if you have a certain GUID in your database. This way, you only have to hit memory (not your database) to determine if an object exists or not. The overhead of that would be measured only in the latency between your web server and cache server (since a memory read is incredibly cheap).
tl;dr
Upsert (or saveOrUpdate) is the way to go. Try not to track the state of one machine on another.

Categories

Resources