I use GWT for UI and Hibernate/Spring for buisness-layer.Following GWT widget is used to display the records.(http://collectionofdemos.appspot.com/demo/com.google.gwt.gen2.demo.scrolltable.PagingScrollTableDemo/PagingScrollTableDemo.html).I assume the sorting is done in client side.
I do not retrieve the entire result set since its huge.
I use
principals = getHibernateTemplate().findByCriteria(criteria,
fromIndex, numOfRecords);
to retrive data.Theres no criteria for sorting in Hibernate layer.
This approach does not give the correct behaviour since it only Sorts the current dataset in the client.
What is the best solution for this problem?
NOTE : I can get the primary-Sort-column and other sort Columns using the UI framework.
May be I can sort the result using primary-sort-column in the hibernate layer?
You need to sort on the server.
Then you can either:
send the complete resultset to the client and handle pagination on the client side. The problem is that the resultset may be big to retrieve from db and sent to the client.
handle the pagination on the server side. The client and the server request only one page at a time from the db. The problem then is that you will order the same data again and again to extract page 1, page 2, etc. each time you ask the db for a specific page. This can be a problem with large database.
have a trade-off between both (for large database):
Set a limit, say 300 items
The server asks the db for the first 301 items according to the order by
The server keept the resultset (up to 301 items) in a cache
The client request the server page by page
The server handles the pagination using the cache
If there are 301 items, the client displays "The hit list contains more than 300 items. It has been truncated".
Note 1: Usually, the client doesn't care if he can't go to the last page. You can improve the solution to count for the total number of rows first (no need of order by then) so that you can display message that is better to the user, e.g. "Result contained 2023 elements, only first 300 can be viewed".
Note 2: if you request the data page by page in the database without using any order criterion, most db (at least Oracle) don't guarantee any ordering. So you may have the same item in page 1 and 2 if you make two requests to the database. The same problem happens if multiple items have the same value that is use to order by (e.g. same date). The db doesn't guarantee any ordering between element with the same value. If this is the case, I would then suggest to use the PK as the last order criterion to order by (e.g. ORDER BY date, PK) so that the paging is done in a consistent way.
Note 3: I speak about client and server, but you can adapt the idea to your particular situation.
Always have a sort column. By default it could by "name" or "id"
Use server side paging. I.e. pass the current page index and fetch the appropriate data subset.
In the fetch criteria / query use the sort column. If none is selected by the client, use the default.
Thus you will have your desired behaviour without trade-offs.
It will be confusing to the user if you sort on a partial result in the GUI, and page on the server.
Since the data set is huge, sending the entire data set to the user and do both paging and sorting there is a no-go.
That only leaves both sorting and paging on the server. You can use Criteria.addOrder() to do sorting in hibernate. See this tutorial.
Related
In Angular 8+, If we need to display list of record, we will display result in pagination way.
We have more than 1 Million of Records and in future also record will increase.
I am using Spring Boot and MYSQL as a Database
But what would be the preferable approach
Getting all the data from server at once and handle Pagination at client side.
Get 10 Records at once and display and when User click at Next Button get the next 10 records from Server.
I think you should use Pagination as compared with all data from the server.
As you are getting all data from the server it is a costly operation as you mention your application has more than millions of records.
With the use of Pagination whenever required at that time API is called and get data based on your Pagination request per page.
I would strongly advise you to go with variant #2.
The main reason to do pagination is not really because it makes sense to only display a few entries in the UI at once. Instead, pagination allows you to only transfer the necessary entries from large data sets (such as yours). This greatly improves performance and reduces the amount of data that has to be sent from the server to the client.
Variant #1 will have very poor performance, because the client has to fetch all 1,000,000 records to then only display 10 of them. This does not make a lot of sense and goes directly against the idea and the advantages of pagination.
Variant #2 on the other hand will only fetch the entries that are actually displayed. And it will only transfer roughly 0.00001% of the data that variant #1 would.
I would use something in between, load maybe 100 or 1000 records. But with one million you browser will go out of memory and with 10 your user gets bored...
I am now working on a web app associated with Amazon DynamoDB,
I want to achieve a function that my users can directly get to the Nth page to view the item info,
I have been told that the pagination in DynamoDB is based on last key, rather than limit/offset.It doesn't natively support offset.DynamoDB Scan / Query Pagination
Does that mean : If I want to get to the 10th page of items, then I have to query the 9 pages ahead first?(which seems reeeeeally not a good solution)
Is there a easier way to do that?
You are right. DynamoDB doesn't support numerical offset. The only way to paginate is to use the LastEvaluatedKey parameter when making a request. You still have some good options to achieve pagination using a number.
Fast Cursor
You can make fast pagination requests by discarding the full result and getting only the Keys. You are limited to 1MB per request. This represents a large amount of Keys! Using this, you can move your cursor to the required position and start reading full objects.
This solution is acceptable for small/medium datasets. You will run into performance and cost issues on large datasets.
Numerical index
You can also create a global secondary index where you will paginate your dataset. You can add for example an offset property to all your objects. You can query this global index directly to get the desired page.
Obviously this only works if you don't use any custom filter... And you have to maintain this value when inserting/deleting/updating objects. So this solution is only good if you have an 'append only' dataset
Cached Cursor
This solution is built on the first one. But instead of fetching keys every single time, you can cache the pages positions and reuse them for other requests. Cache tools like redis or memcached can help you to achieve that.
You check the cache to see if pages are already calculated
If not, you scan your dataset getting only Keys. Then you store the starting Key of each page in your cache.
You request the desired page to fetch full objects
Choose the solution that fits your needs. I hope this will help you :)
I'm creating a hypermedia driven RESTful API that will be used to query transactional data. The intention is that the results will be paginated.
Each API call will query an indexed database table. Since I don't want to keep the results server side due to memory considerations, I was thinking to retrieve the data based on rownum, dependent upon which page is requested. E.G. on page one, WHERE rownum <= 10, on page two, WHERE rownum BETWEEN 11 AND 20 etc.
However, the database in question is replicated from a production system and could potentially add records into an area of the result set already requested. E.G. page one is requested -> 10 rows are returned -> a transaction is inserted at row 5. Now page two will include a record already displayed on page one, as the results are essentially pushed up by a rownum.
What would be a good way of achieving my objective of creating a hypermedia driven RESTful API that provides paginated transactional data from a database, without holding on to the result sets for the duration of the session?
This is a pretty common problem and there are actually not many approaches.
I can think of only three, actually:
You don't care and the result will change. This is the behaviour of stackoverflow: if you're on page 2 of the questions page and someone posts a new question, when clicking on page 3 you may get one or more of the questions that were already listed on page 2, because the index has shifted.
If you don't want to keep in memory the actual data, you're in for a lot of trouble. You could store the handler for the result set, instead of the results themselves, and loop over it fetching the number of rows that you actually need. E.g. you run the select, fetch 10 rows and store the handler of the resultset. Together with the rows, you return to the client a unique ID of the query. The problem will be when you have a range specified, because you can't really "rewind" a database cursor, and that would mean caching the results, which you may want to do anyway. But if you do it like that, sooner or later you're going to have all of the results in memory anyway.
You could still use some memory, but keep only some unique identifier of the rows, associated with a unique identifier of the query, as above. This could work, but only if the rows may be added, and not deleted or updated (if they're updated, they may not match the query any more).
Personally, I'd go with option 1.
I am thinking of setting up a page in an application that each of the queries can return a resultset that cannot fit in memory or the query is very expensive to fetch all of them. The user will be hitting "get more" to get more of those results. I wonder if I could use a yielder for Java something like that (http://benjiweber.co.uk/blog/2015/03/21/yield-return-in-java/) and if I will need Web Sockets e.g from Spring (http://docs.spring.io/spring/docs/current/spring-framework-reference/html/websocket.html) so that the client can tell to Server to push more results. Also could you please give an example of the handshake .. Will the endpoint uri be based on some session id as well? Also when databases like OrientDB/Neo4j return Iterables does it mean that we can keep the connection open and get the next rows after minutes without problems? Thanks!
You are talking about two different concepts.
Pagination
If you have a large result set and you need to return it piece by piece to avoid long query times or high memory requirements, you're paginating the over the result set.
To do this, you require another piece of the set hitting "Get More" button from the client. Each time you require more, the server will receive a request from the server and will hit the DB with some paginated query.
Example in SQL (page 10, 10 results/page , for instance):
SELECT * FROM Table OFFSET 100 LIMIT 109
Websockets / Yielder
You'll need a websocket / yielder when is the server the one who sends data, in other words, the client doesn't require an update, it only keeps the socket open and receives updates from the Server when they come.
That's the case of a Message service, for example, avoiding constant polling from the client side.
In your case is absolutely unnecessary a websocket. You can also see an example of what I'm saying here -> What's the behavioral difference between HTTP Stay-Alive and Websockets?
However you can setup a keep-alive connection between your back-end and database in order to avoid closing/opening constantly the connection each time the user requires more results.
Finally, your question about Iterable results in Neo4j. Neo4j's result type is an Iterable list of Map<String,Object> which represents a List of key-value pairs. That doesn't keep the connection alive (by default), it only iterates through the returned results of that particular query.
I have a database table for log messages and at any time there can be inserted new rows. I want to show them in grid and when you scroll down I want to request more rows form this table (server side) but without to be affected from new added rows. The new rows only have to be visible if I refresh the whole grid.
I'm not sure how can I request rows in a range (from, to) using JDBC. I think there is no portable (across deferent databases) SQL query to do this? (I'm using MYSQL)
I think that after reading first page of this table I have to send to the client side the Max Id from log table and after that request new rows using this Max Id as parameter in SQL (WHERE id <= MAXID) but I'm not sure how I can pass this parameter from server to client and back using RestDateSource?
Do you have any better ideas how I can make this?
P.S. I'm using LGPL SmartGWT version and using my own servlets for server side.
Here is what I would do; I imagine that you either have a growing-number ID or a timestamp for each of your rows.
Before you start querying for data, you call a webservice to query the current id (eg last line insterted is 12345).
Then you add a Criteria object to your datasource that says "rowId <= 12345". At this point, you can use the grid freely - paging, sorting, etc will work automatically as new rows will automatically be excluded.
(Or if you use a personalized datasource and not the default RESTdataSource, you basically do the same thing without using Criteria explicitly).
SmartGWT Pro and better do this automatically. Even if you don't want to use Pro, you can download the evaluation (smartclient.com/builds) and watch the server-side console, where the SQL queries are logged.