I have generic problem where I am loading data from backend in blocks i.e. in pages.I have created cache which stores maximum 2-3 pages at a time.
Say page 1 - 1-1000
Page 1001-2000
class page{
List<Data>, startoffset, endoffset, pageno}
Here client could be UI or any other service.
Now client is asking for data from 1-100,101-200. Till the time the range is being served from one page, I can accommodate the changes by calculating page no from supplied range.
If page no is not there, I can load that range from backend and keep it in cache.
However, I am facing issue when client request for data that overlaps over multiple blocks.
example- when client asks for Page 950-1050, then data is spanning over two pages.
Any suggestion on how to model classes/blocks in such case i.e. how to keep server side data in memory in blocks and send it to GUI.
I don't see a problem in using two (or even more) contiguous blocks fetched from DB enough to cover the region requested for ui. You can do it both eagerly and lazyly ( that is discover that you don't have enough rows and fetch extra). This is pretty normal situation for the kind of process you try to execute. Overlapping is present always in cases when user is free to choose desired region
Related
I have some problems understanding the best concept for my problem.
My architecure is pretty basic. I have a backend with data that can be updated and clients which will load data with some filtes.
I have a backend that has the data in a EHCache.
The data model is pretty basic for example
{
id: string,
startDate: date,
endDate: date,
username: string,
group: string
}
The data can only be modified by another backend service.
When data is modified, added or deleted we have an data update event generated.
The clients are all web clients and have a Spring boot REST Service to fetch the data from the cache.
For the data request the clients sends his own request settings. There are different settings like date and text filter. For example
{
contentFilter: Filter,
startDateFilter: date,
endDateFilter: date
}
The backend use this settings to filter the data from the cache and then sends the response with the filtered data.
When the cache generates an update event every client gets notified by a websocket connection.
And then request the full data with the same request settings as before.
My problem is now that there are many cache updates happening and that the clients can have a lots of data to load if the full dataset is loaded everytime.
For example I have this scenario.
Full dataset in cache: 100 000 rows
Update of rows in cache: 5-10 random rows every 1-5 seconds
Client1 dataset with request filter: 5000 rows
Client2 dataset with request filter: 50 rows
Now everytime the client receives a update notification the client will load the complete dataset (5000 rows) and that every 1-5 seconds. If the update only happens on the same row everytime and the row isn´t loaded by the client because of his filter settings then the client would be loading the data unnecessarily.
I am not sure what would be the best solution to reduce the client updates and increase the performance.
My first thought was to just send the updated line directly with the websocket connection to the clients.
But for that I would have to know if the client "needs" the updated line. If the updates are happening on rows that the clients doesn´t need to load because of the filter settings then I would spam the client with unnecessary updates.
I could add a check on the client side if the id of the updated row is in the loaded dataset but then I would need a separate check if a row is added to the cache instead of an update.
But I am not sure if that is the best practice. And unfortunately I can not find many resources about this topic.
The most efficient things are always the most work, sadly.
I won't claim to be an expert at this kind of thing - on either the implementation(s) available or even the best practices - but I can give some food for thought at least, which may or may not be of help.
My first choice: your first thought.
You have the problem of knowing if the updated item is relevant to the client, due to the filters.
Save the filters for the client whenever they request the full data set!
Row gets updated, check through all the client filters to see if it is relevant to any of them, push out to those it is.
The effort for maintaining that filter cache is minimal (update whenever they change their filters), and you'll also be sending down minimal data to the clients. You also won't be iterating over a large dataset multiple times, just the smaller client set and only for the few rows that have been updated.
Another option:
If you don't go ahead with option 1, option 2 might be to group updates - assuming you have the luxury of not needing immediate, real-time updates.
Instead of telling the clients about every data update, only tell them every x seconds that there might be data waiting for them (might be, you little tease).
I was going to add other options but, to be honest, I don't see why you'd worry about much beyond option 1, maybe with an option 2 addition to reduce traffic if that's an issue.
'Best practice'-wise, sending down multiple FULL datasets to multiple clients multiple times a second is certainly not it.
Sending only the data relevant to each client is a much better solution, and if you can further reduce how much the client even needs to send (i.e. only their filter updates and not have them re-send something you could already have saved) is an added bonus.
Edit:
Ah, stateless server - though it's not really stateless. You're using web sockets, so the server has some kind of state for those connections. It's already stateful so option 1 doesn't really break anything.
If it's to be completely stateless, then you also can't store the updated rows of data, so you can't return those individually. You're back to what you're doing which is a full round-trip and data read + serve.
Option 3, though, if you're semi stateless (don't want to add any metadata to those socket connections) but do hold updated rows: timestamp them and have the clients send the time of their last update along with their filters - you can then return only the updated rows since their last update using their provided filters (timestamp becomes just another filter) (or maybe it is stateless, but the timestamp becomes another filter).
Either way, limiting the updated data back down to the client is the main goal if for nothing else than saving data transfer.
Edit 2:
Sounds like you may need to send two bits of data down (or three if you want to split things even further - makes life easier client-side, I guess):
{
newItems: [{...}, ...],
updatedItems: [{...}, ...],
deletedIds: [1,2...]
}
Yes, when their request for an update comes, you'll have to check through your updated items to see if any are deleted and of relevance to the client's filters, but you can send down a minimal list of ids rather than whole rows that your client can then remove.
In Angular 8+, If we need to display list of record, we will display result in pagination way.
We have more than 1 Million of Records and in future also record will increase.
I am using Spring Boot and MYSQL as a Database
But what would be the preferable approach
Getting all the data from server at once and handle Pagination at client side.
Get 10 Records at once and display and when User click at Next Button get the next 10 records from Server.
I think you should use Pagination as compared with all data from the server.
As you are getting all data from the server it is a costly operation as you mention your application has more than millions of records.
With the use of Pagination whenever required at that time API is called and get data based on your Pagination request per page.
I would strongly advise you to go with variant #2.
The main reason to do pagination is not really because it makes sense to only display a few entries in the UI at once. Instead, pagination allows you to only transfer the necessary entries from large data sets (such as yours). This greatly improves performance and reduces the amount of data that has to be sent from the server to the client.
Variant #1 will have very poor performance, because the client has to fetch all 1,000,000 records to then only display 10 of them. This does not make a lot of sense and goes directly against the idea and the advantages of pagination.
Variant #2 on the other hand will only fetch the entries that are actually displayed. And it will only transfer roughly 0.00001% of the data that variant #1 would.
I would use something in between, load maybe 100 or 1000 records. But with one million you browser will go out of memory and with 10 your user gets bored...
Servlet returns about 4000 rows as json object.
Without any processing in javascript, browser is not responsive for a second or two.
When that data is processed in "for loop" ...not responsive for about 4 seconds.
Is that json object to large for browser to handle...even without processing is not responsive?
Any idea how to solve that?...Thanks.
If it is slower than you are happy with, you're probably going to have to chunk the data.
Only fetch the data that is going to be showed on page load and then request the rest as needed. If it's for a grid, load the first 20 records and then retch what is needed per request.
If you need ALL of the data
1) only fetch the data that shows on the page
2) load the rest of the data in the background
3) merge the data behind the scenes
So far I've seen examples that use the following logic:
Create a table / grid object
Set its data source (Collection such as array list/ set)
The table shows the entries on the client side!
Problem is, we have millions of rows to display, (on a side note I tried to load the container with all the entries, it took tons of time, and the client side performance were lacking)
So that raises the question:
How do you show huge amount of data on the zk tables \ grids? Wishful
thinking points me to think that instead of an array list data source
i could set a DB connection or something instead, and that will manage
the results on demand with paging.
Any ideas?
Why loading data when you are not displaying all the rows at a time.
Retieve only those data that should be displayed and load the other data on demand not while the page is initially loading.
if you try to fetch 1 million rows and try to bind it to a control, it will hugely effect your application performance and increases the time for your page to load.
So, my adivce should be fetch only those rows that needs to be displayed. if the request comes from the user for the next set of pages then load that data and bind.
you can make ajax calls to avoid every time entire page refershing
hope this helps..
ZK give BigListbox to show huge record
I have a 20-column grid with anywhere from 100 to 1,000 rows.
If each cell averages 50 characters, I would estimate that a 1000-row grid would consist of 20x50x1000 characters = 1MB.
The data for this grid has to be returned by the server in one (or more) AJAX requests. The grid is un-editable... it is just a way of representing a lot of information (about human genes, in particular).
I am having a hard time deciding whether I should return this in one AJAX request or several. Do you think this is too much data (1MB) to return in the XML/JSON response of one AJAX request? Is this an anti-pattern? Or does it make sense seeing how all the data is logically part of one grid?
This is more of a design question than anything else. I appreciate any feedback.
Could you not load all the data once using a non-Ajax request and then update only the cells that change via Ajax?
Maybe it would interesting keeping the grid "state" in the server so after each field is edited you send the new contents to the server. That would increase the server usage and bandwidth used, but would make it more responsive when the user sends the "submit" command. This also will allow faster input validation (showing an error message almost after the user has modified a cell, and not half hour later).
As an improvement to be in the safe side, keep in memory (JS memory) a list of "dirty" (modified) fields and reset the value when its related ajax response tells you that the server has ack'ed the ajax call; when the user hits "submit" all fields still dirty are sent again to the server.
That said, as long as you stay away from XML, I do not think it is a load so heavy (of course that will depend of hardware and the concurrent users that you have to service).
I'd suggest you fetch the data in batches. i.e, fetch the first 20-25 rows, or just enough to fill the viewport, then gradually load the next batch as the user scrolls through or when he nears the end of the previous batch. That might make it seamless.
Fetching all the data at once might not be an option considering the number of records you might have.
Furthermore, it's not about too much data, it's about the time it takes to fetch that data. I can guarantee you that (according to how you manipulate the json response), the browser can handle any amount of data.