Checking if a Set of items exist in database quickly - java

I have an external service which I'm grabbing a list of items from, and persisting locally a relationship between those items and a user. I feed that external service a name, and get back the associated items with that name. I am choosing to persist them locally because I'd like to keep my own attributes about those external items once they've been discovered by my application. The items themselves are pretty static objects, but the total number of them are unknown to me, and the only time I learn about new ones is if a new user has an association with them on the external service.
When I get a list of them back from the external service, I want to check if they exist in my database first, and use that object instead but if it doesn't I need to add them so I can set my own attributes and keep the association to my user.
Right now I have the following (pseudocode, since it's broken into service layers etc):
Set<ExternalItem> items = externalService.getItemsForUser(user.name);
for (ExternalItem externalItem : items){
Item dbItem = sessionFactory.getCurrentSession().get(Item.class,item.id);
if (dbitem == null){
//Not in database, create it.
dbItem = mapToItem(externalItem);
}
user.addItem(dbItem);
}
sessionFactory.getCurrentSession().save(user);//Saves the associated Items also.
The time this operation is taking is around 16 seconds for approximately 500 external items. The remote operation is around 1 second of that, and the save is negligible also. The drain that I'm noticing comes from the numerous session.get(Item.class,item.id) calls I'm doing.
Is there a better way to check for an existing Item in my database than this, given that I get a Set back from my external service?
Note: The external item's id is reliable to be the same as mine, and a single id will always represent the same External Item

I would definitely recommend a native query, as recommended in the comments.
I would not bother to chunk them, though, given the numbers you are talking about. Postgres should be able to handle an IN clause with 500 elements with no problems. I have had programmatically generated queries with many more items than that which performed fine.
This way you also have only one round trip, which, assuming the proper indexes are in place, really should complete in sub-second time.

Related

How to cache data if there is no unique identifier?

I have a spring-boot application. I have a method that gives away some content. Let's say bank customers. Let's say that bank customers are not added to the database very often. How can I cache this method so that the method follows a new list to the database only when a new customer is added there?
Sounds trivial enough, but I can't think of a mechanism.
That is, we have a method of say
GET /customers
The cache is tuned to the unique parameters in the request. Since this method has no unique parameters, the data will always be returned from the cache, even if the list of clients is added one extra.
For example, you can think of some method with a boolean response. It would return true when the list is updated. But this looks like a nasty solution. And it also means that instead of one request you have to do two at once. And if you also have authorization, authentication - that's three requests. That sounds bad.
Can you give me a hint?
Where do you will store the cache? Some of them, like Redis, you can set the expiry of the cached data. So it will refresh the data for every time that you have set.
Or you can store something like 'version' in the database, and also add to cache. So everytime you add new customer, you also update the version. Then it can be used to compare the version in db and cache are same or not, if not then get new list to database and re-add to cache. But this way need to call db every time you hit the GET /customers.

Java in-memory cache with subcaches for each client

I have a ConcurrentMap which is my in-memory cache/database for my web app.
There I have stored my entity with the id as key.
This is how my entity basically looks like.
public class MyEntity {
private int id;
private String name;
private Date start;
private Date end;
...
}
Now I have multiple Users which requests different data from my map.
User1 has a filter for the start date. So for example he only gets item 1, 2, and 3 of my map. User2 has also a filter and only gets item 2, 3, 4, 5 of the map.
So they only get a part of the complete map. I am doing the filtering on my server because the map is to big to send the complete map and I need to check other attributes.
My problem now is that the entries in the map can be updated / removed / added from some other API calls and I want to live update the entries on the user side.
For now I am sending a notification to the users that the map has been updated and then every user loads the complete data which the user needs.
For example item 8 has been updated. User1 gets a notification and loads item 1, 2, 3 again even if the update was only on item 8. So in this case it would be unnecessary to update for User1 because he doesnt need item 8.
Now i am searching for a good solution so that the User only receives the necessary updates.
One way I was thinking about was to store temporarily all items id which the user requested. So on an update notification I can check if the item updated is in the list and then send the updated item to the user only if it is in the users list.
But I am concerning the memory usage that this will create in case I have a lot of users and the user list with the item ids can be very big too.
What would be a good solution to send only the added / updated / removed item to the user and only if the user needs that item?
So something like observing only a part of the base map (cache) but with a notification for every action like adding, updating and removing item.
Basically there is no "Silver Bullet". A possible solution depends on the usage patterns, your resources and requirements.
Questions to ask:
How big is the total dataset, now and in the future?
How big is the displayed dataset?
How many users with searches?
Is it a business application with a closed user group or a public web application that needs to scale?
What is the update frequency and what kind of updates are common?
How many different search patterns are present?
The biggest question of all:
Is it really needed?
Looking at a bunch of search interfaces, the user expects an update only when there is an interaction. If it is similar to a search, then the users will not expect an instant update. An instant update can be useful and "innovative", but you do spend a lot of engineering costs into that problem.
So before jumping onto the engineering task, make sure that the benefit will justify the costs. Maybe check on these alternative approaches first:
Don't do it. Only update on a user interaction. Maybe add a reload button.
Only notify the user that an update occurred, but only update when the user hits "reload". This solves the problem that the user may not have the browser tab in focus and the transfer is a waste.
But I am concerning the memory usage that this will create in case I have a lot of users and the user list with the item ids can be very big too.
In our applications we observe that there are not so many different search patterns / filters. Let's say you might have 1000 user sessions, but only 20 different popular searches. If that is true, you can do:
For each search filter, you can store a hash value of the result.
If you do the update of the main data, you run the searches again and only send an update to the user if the hash value changed.
If it is a public web application, you should optimize more. E.g. don't send an update when the application has no focus.

How to keep a java list in memory synced with a table in database?

I want to perform a search of a inputs in a list. That list resides in a database. I see two options for doing that-
Hit the db for each search and return the result.
keep a copy in memory synced with table and search in memory and return the result.
I like the second option as it will be faster. However I am confused on how to keep the list in sync with table.
example : I have a list L = [12,11,14,42,56]
and I receive an input : 14
I need to return the result if the input does exists in the list or not. The list can be updated by other applications. I need to keep the list in sync with table.
What would be the most optimized approach here and how to keep the list in sync with database?
Is there any way my application can be informed of the changes in the table so that I can reload the list on demand.
Instead of recreating your own implementation of something that already exists, I would leverage Hibernate's Second Level Cache (2LC) with an implementation such as EhCache.
By using a 2LC, you can specify the time-to-live expiration time for your entities and once they expire, any query would reload them from the database. If the entity cache has not yet expired, Hibernate will hydrate them from the 2LC application cache rather than the database.
If you are using Spring, you might also want to take a look at #Cachable. This operates at the component / bean tier allowing Spring to cache a result-set into a named region. See their documentation for more details.
To satisfied your requirement, you should control the read and write in one place, otherwise, there will always be some unsync case for the data.

How to sync large lists between client and server

I'd like to sync a large list of items between the client and the server. Since the list is pretty big I can't sync it in a single request so, how can I ensure the list to be synched with a reasonable amount of calls to the synchronization service?
For example:
And I want to sync a list with 100.000 items so I make a web service with the following signature
getItems(int offset,int quantity): Item[]
The problem comes when, between call and call, the list is modified. For example:
getItems(0,100) : Return items (in the original list) [0,100)
getItems(100,100): Return items (in the original list) [100,200)
##### before the next call the items 0-100 are removed ####
getItems(200,100): Return items (in the original list) [300,400)
So the items [200,300) are never retrieved. (Duplicated items can also be retrieved if items are added instead of removed.
How can I ensure a correct sync of this list?
From time to time, the service should save immutable snapshots. The interface should be getItems(long snapshotNumber, int offset,int quantity)
to save time, space, and traffic, not every modification of the list should form a snapshot, but every modification should form a log message (e.g. add items, remove range of items), and that log messages should be send to the client instead of full snapshots. Interface can be getModification(long snapshotNumber, int modificationNumber):Modification.
Can you make the list ordered on some parameter on the server side? For e.g. a real world use-case for this scenario is showing records in a table on UI. The number of records on the server side can be huge so you wouldn't want to get the whole list at once and instead you get them on each scroll that the user makes.
In this case, if the list is ordered, you get a lot of things for free. And your API becomes getItems(long lastRecordId,int quantity). Here lastRecordId would be a unique key identifying that particular record. You use this key to calculate the offset (on the server side) and retrieve the next batch from this offset location and return the recordId of the last record to the client which it uses in its next API call.
You don't have to maintain snapshots and there wouldn't be any duplicate records retrieved. The scenarios that you mention in case of removals/insertions don't occur in this case. But at some point in time, you would have to discard the copy that the client has and start syncing all over again if you want to track additions and removals on the client side for the data that the client has already seen.

client view of very large collection of objects. How to optimize?

I have 3-tier EJB application, and I need to create a view on a thick client (desktop Java application) that shows a very large collection of objects (over 5000 orders). Each object has child properties that are also complex objects, for example:
class Address
{
String value
// other properties
}
class Order
{
public String Number
// this is collection of complex object and I need first and last object to show it's
// properties in view
public List<Address> getAddresses()
// other properties
}
The view is a table of Orders:
Number | FirstAddress | LastAddress | ...
My first attempt was to load full List of orders (without child properties) and then dynamically download child objects when needed for display. But when I have 10000 orders and begin fast scrolling, the UI become unresponsive.
Then I try to load all orders and all children that need to be shown in the table, but the UI gets very heavy and slow, possibly because of memory cost). And it's not thick client at all, because I download almost all data from db.
What is best practice to solve this task?
Assuming you are using a JTable as the view of a suitable TableModel, query the database using a SwingWorker and publish() the results as they arrive. For simplicity, this example simply fetches random data in blocks of 10. Note that the UI remains responsive as data accumulates.
Follow Value Object or Data Transfer Object pattern. Send only what you really need. Instead of sending a graph of domain objects, just create one or more 'stupid' flat objects (containg simple attributes) per view.
I suggest implementing some sort of pagination, in other words you'll have to implement a mechanism for retrieving only a small subset of all your data, and show them chunk by chunk in different pages.
Exactly "how" depends on your approach so far.
you can either use some programming pattern like those already
mentioned
or you can implement it at DB level, where you query your DB server,
i.e. depending on the chosen DBMS you'll have to write the fetch
queries in such a manner that they retrieve only a portion of all the
data, like in here.
hope this helps!
It's advised to make a proxy object for your list that simply gets only a small part of it's elements, and also the total count, and then has the ability to load on demand other parts of the original list

Categories

Resources