pagination vs all data from server - java

In Angular 8+, If we need to display list of record, we will display result in pagination way.
We have more than 1 Million of Records and in future also record will increase.
I am using Spring Boot and MYSQL as a Database
But what would be the preferable approach
Getting all the data from server at once and handle Pagination at client side.
Get 10 Records at once and display and when User click at Next Button get the next 10 records from Server.

I think you should use Pagination as compared with all data from the server.
As you are getting all data from the server it is a costly operation as you mention your application has more than millions of records.
With the use of Pagination whenever required at that time API is called and get data based on your Pagination request per page.

I would strongly advise you to go with variant #2.
The main reason to do pagination is not really because it makes sense to only display a few entries in the UI at once. Instead, pagination allows you to only transfer the necessary entries from large data sets (such as yours). This greatly improves performance and reduces the amount of data that has to be sent from the server to the client.
Variant #1 will have very poor performance, because the client has to fetch all 1,000,000 records to then only display 10 of them. This does not make a lot of sense and goes directly against the idea and the advantages of pagination.
Variant #2 on the other hand will only fetch the entries that are actually displayed. And it will only transfer roughly 0.00001% of the data that variant #1 would.

I would use something in between, load maybe 100 or 1000 records. But with one million you browser will go out of memory and with 10 your user gets bored...

Related

Filters logic should be on frontend or backend?

I’m creating a web application
Frontend - reactjs and backend java.
Frontend and backend communicate with each other via rest.
On UI I show a list of items. And I need to filter them for some params.
Option 1: filter logic is on front end
In this case I just need to make a get call to backend and get all items.
After user choose some filter option filtering is happening on ui.
Pros: for that I don’t need to send data to back end and wait for response. Speed of refreshing the list should be faster.
Cons: If I will need multiple frontend clients. Let’s say a mobile app. Than I need to create filters again on this app too.
Option 2: filter logic is on back end
In this case I get all list items when app is loading. After user changes the filter options I need to send a get request with filters params and wait for response.
After that update a list of items on UI.
Pros: filter logic is written only once.
Cons: Speed probably will be much slower. Because it takes time to send request and get a result back.
Question: Where the filter logic should be? In frontend or in backend? Or maybe what is a best practice?
Filter and limit on the back end. If you had a million records, and a hundred thousand users trying to access those records at the same time, would you really want to send a million records to EVERY user? It'd kill your server and user experience (waiting for a million records to propagate from the back end for every user AND then propagate on the front end would take ages when compared to just getting 20-100 records and then clicking a (pagination) button to retrieve the next 20-100). On top of that, then to filter a million records on the front-end would, again, take a very long time and ultimately not be very practical.
From a real world stand point, most websites have some sort of record limit: Ebay = 50-200 records, Amazon = ~20, Target = ~20... etc. This ensures quick server responses and a smooth user experience for every user.
This depends on the size of your data.
For eg: If you are having a large amount of data, it is better to implement the filter logic on the backend and let the db perform the operations.
In case, you have less amount of data, you can do the filter logic on the front end after getting the data.
Let us understand this by an example.
Suppose you have an entity having 1,00,000 records and you want to show it in a grid.
In this case it is better to get 10 records on every call and show it in a grid.
If you want to perform any filter operation on this, it is better to make a query for the db on the backend and get the results
In case it you have just 1000 records in your entity, it will be beneficial to get all the data and do all the filter operations on the frontend.
Most likely begin with the frontend (unless you're dealing with huge amounts of data):
Implement filtering on the frontend (unless for some reason it's easier to do it on the backend, which I find unlikely).
Iterate until filtering functionality is somewhat stable.
Analyze your traffic, see if it makes sense to put the effort into implementing backend filtering. See what percentage of requests are actually filtered, and what savings you'd be getting from backend filtering.
Implement (or not) backend filtering depending on the results of #3.
As a personal note, the accepted answer is terrible advice:
"If you had a million records, and a hundred thousand users trying to access those records at the same time"; nothing is forcing the hundred thousand users to use filtering, your system should be able to handle that doomsday scenario. Backend filtering should be just an optimization, not a solution.
once you do filtering on the backend you'll probably want to do pagination as well; this is not a trivial feature if you want consistent results.
doing backend filtering is likely to become much more complex than just frontend filtering; you should be aware that you're going to spend a significant amount of time (not only for the initial implementation but also for ongoing maintenance) and ask yourself if it's not premature optimization.
TL/DR: Do wherever is easier for you and don't worry about it until it makes sense to start optimizing.
It depends on the specific requirements of your application, but in my opinion the safer bet would be the back-end.
Considering you need filtering in the first place, I assume you have enough data so that paging through it is required. In this case, you need to have the filtering on the back-end.
Lets say you have a page size of 20. After you apply the filter you would expect to have a page of 20 entities that match that specific filtering criteria in the UI. This can't be achieved if you fetch 20 entities, store them in the front-end and afterwards apply the filter on them.
Also, if you have enough data, fetching all of it in the front-end will be impossible due to memory constraints.

Data Fetch issue in Java

From a Java application, if I need to fetch 100 000 records from any RDBMS. What are the things I should consider? will it be fetched by a simple select statement?
What are the things I should consider?
The most obvious thing is that it could take a long time to transfer 100,000 "huge" records over a JDBC connection.
You might want to look at alternatives ... like a database specific data extraction tool of some kind.
will it be fetched by a simple select statement?
If you are willing to wait long enough for the transfer to compete, yes.
Suppose that application has to fetch all the records and display in UI in a J2EE application and the application is using Spring MVC with Hibernate as the ORM layer.
More things to consider:
Attempting to display 100,000 records to a user in a single page is a bit crazy. No user is going to want to scroll through 100,000 records.
If you are doing this kind of thing via an ORM, you are liable to us an inordinate amount of server-side memory.
My advice: don't fetch all 100,000 records. Instead, fetch the first N, and implement a scheme that allows the user to page through the records.

Processing millions of records from mysql in java and store the result in another database

I have around 15 million records in MySQL (read only) which will be fetched using joins of 10 tables. Around 50000 new records are inserted daily. Number will keep on increasing in future.
Each record will be processed independently by a java program. Multiple processing will be done on the same record and output will be calculated based on the processing.
Results will be stored in another database.
Processing shall be completed within an hour
My questions are
How to design the processing engine (cluster of java programs) in a distributed manner making the processing as fast as possible? To be more precise, I want to boot many spot instance at that time and finish the processing.
Will mysql be a read bottleneck?
I don't have any experience in big data solutions. Shall I use spark or any other map reduce solution? If yes, then how shall I proceed?
I was in a similar situation where we were collecting about 15 million records per day. What I did was create some collection tables that I rotated and performed initial processing. Once that was done, I moved the data to the next phase where further processing was done before adding it to the large collection of data. Breaking it down will get the best performance and avoid having to run through a large set of data.
I'm not sure what you mean about processing data and why you want to do it in Java, you may have a good reason for that. I would imagine that performance would be much better if you offload that to MySQL and let it do as much of the processing as possible.

Web Service Architecture: Redis (as cache) & PostgreSQL for persistence

I'm developing a Java REST API that uses client data from a postgreSQL database.
The numbers:
. About 600 clients at the beginning
. Some of them doing requests every few seconds
Because clients pay per request, we need to control if their number of successful requests reach their limit, and as querying postgresql data (update the value of 'hitsCounter' field) after every request is bad in terms of performance, we are thinking about implementing a cache system with redis.
The idea:
After a client does his first request, we retrieve his data from postgresql and store it into redis cache. Then work with this cache-data, for example incrementing the 'hitsCounter' key value, till the client stops doing requests.
In parallel, every few minutes a background process persist data from redis cache to db tables, so at the end we have the updated data back to postgresql, and we can deal with them in the future.
I think it obviously increase performance, but I'm not sure about this "background process". An option is to check the TTL of the cache elements and if it's minor than some value (it means client has finished doing requests), persist the data.
I would love to hear some opinions about this. Is this a good idea? Do you know some better alternatives?
Perfectly reasonable idea, but you've not mentioned any measurements you've made. What is the bottleneck in your target hardware with your target transaction levels? Without knowing that, you can't say.
You could use an unlogged table perhaps. Just insert a row with every query, then summarise every 5 minutes, clearing out old data. Then again, with HOT updates, and say 75% fill-factor maybe updates are more efficient. I don't know (and nor do you) we haven't measured it.
Not enough? Stick it on its own tablespace on ssd.
Not enough? Stick it on its own vm/machine.
Not enough? Just write the damn stuff to flat files on each front-end box and batch the data once a minute into the database.
Also - how much are they paying per query? Do you care if power fails and you lose five seconds of query logs? Do you need to be able to reproduce receipts for each query with originating details and a timestamp?

Server side caching for Java/Java EE application

Here is my situation: I have Java EE single page application. All client-server communication is AJAX based with JSON is used as format to exchange data. One of my request takes around 1 min to calculate data required by client. Also this data is huge(Could be > 20 MB). So it is not possible to pass entire data to javascript in one go. So for this reason I am only passing few records to client and using grid to display data with paging option.
Now when user clicks on next page button, I need to get more data. My question is how do I cache data on server side ? I need this data only for one user as a time. Would you recommend caching all data one first request using session id as key ?
Any other suggestions ?
I am assuming you are using DB backend for that. I'd use limits to return small chunks of data, most DB vendors have solution for this. That would make your queries faster, and also most of JS fameworks with grid type of components will support paginating results(ExtJS for example).
If you are fetching data from 3rd party and passing it on (with some modifications or not) I'd still stick to the database and use such workflow: pool data from 3rd party, save in db, call from your widget small chunks required by customers.
Hope this helps.
The cheapest (and not so ineffective way of caching data) in a Java EE web application is to use the Session object like you intend to do. It's ineffective since it requires the developer to ensure that the cache does not leak memory; so it is upto to the developer to nullify the reference to the object once the object is no longer needed.
However, even if you wish to implement the poor man's cache, caching 20MB of data is not advisable, as it does not scale well. The scalability question rises when multiple users utilize the same functionality of the application, in which case 20MB is a lot of data.
You're better off returning paginated "datasets" in the form of JSON, based on the ValueList design pattern. Each request for the query of data will result in partial retrieval of data, which is then sent down the wire to the client. That way, you never have to cache the complete results of the query execution, and also you can return partial datasets. It is entirely upto to you, as to whether you want to cache; usually caching is done for large datasets that are utilized time and again.

Categories

Resources