Spring Pagination ?size really affects request time

Spring Pagination ?size really affects request time - java

I have got a simple CRUD application - get some entities, convert it to DTO and send via REST controller.
I want to search by the field, that is not unique, but really rarely recurring. So when I send POST with 'field':'123' - my app return only 2 records.
The question is why when I send ?sort=someFiled%2Cdesc&size=10000&page=0the request take a really long time (~900ms), but when I send it with size=10, the request duration is ~150ms? When the result is only 2 fields, I thought that the request should take the same time for many sizes.
Tested on my remote machine, because locally the time was to short to compare it.
Is this a spring problem?

Related

Server/Client live updates

I have some problems understanding the best concept for my problem.
My architecure is pretty basic. I have a backend with data that can be updated and clients which will load data with some filtes.
I have a backend that has the data in a EHCache.
The data model is pretty basic for example
{
id: string,
startDate: date,
endDate: date,
username: string,
group: string
}
The data can only be modified by another backend service.
When data is modified, added or deleted we have an data update event generated.
The clients are all web clients and have a Spring boot REST Service to fetch the data from the cache.
For the data request the clients sends his own request settings. There are different settings like date and text filter. For example
{
contentFilter: Filter,
startDateFilter: date,
endDateFilter: date
}
The backend use this settings to filter the data from the cache and then sends the response with the filtered data.
When the cache generates an update event every client gets notified by a websocket connection.
And then request the full data with the same request settings as before.
My problem is now that there are many cache updates happening and that the clients can have a lots of data to load if the full dataset is loaded everytime.
For example I have this scenario.
Full dataset in cache: 100 000 rows
Update of rows in cache: 5-10 random rows every 1-5 seconds
Client1 dataset with request filter: 5000 rows
Client2 dataset with request filter: 50 rows
Now everytime the client receives a update notification the client will load the complete dataset (5000 rows) and that every 1-5 seconds. If the update only happens on the same row everytime and the row isn´t loaded by the client because of his filter settings then the client would be loading the data unnecessarily.
I am not sure what would be the best solution to reduce the client updates and increase the performance.
My first thought was to just send the updated line directly with the websocket connection to the clients.
But for that I would have to know if the client "needs" the updated line. If the updates are happening on rows that the clients doesn´t need to load because of the filter settings then I would spam the client with unnecessary updates.
I could add a check on the client side if the id of the updated row is in the loaded dataset but then I would need a separate check if a row is added to the cache instead of an update.
But I am not sure if that is the best practice. And unfortunately I can not find many resources about this topic.

The most efficient things are always the most work, sadly.
I won't claim to be an expert at this kind of thing - on either the implementation(s) available or even the best practices - but I can give some food for thought at least, which may or may not be of help.
My first choice: your first thought.
You have the problem of knowing if the updated item is relevant to the client, due to the filters.
Save the filters for the client whenever they request the full data set!
Row gets updated, check through all the client filters to see if it is relevant to any of them, push out to those it is.
The effort for maintaining that filter cache is minimal (update whenever they change their filters), and you'll also be sending down minimal data to the clients. You also won't be iterating over a large dataset multiple times, just the smaller client set and only for the few rows that have been updated.
Another option:
If you don't go ahead with option 1, option 2 might be to group updates - assuming you have the luxury of not needing immediate, real-time updates.
Instead of telling the clients about every data update, only tell them every x seconds that there might be data waiting for them (might be, you little tease).
I was going to add other options but, to be honest, I don't see why you'd worry about much beyond option 1, maybe with an option 2 addition to reduce traffic if that's an issue.
'Best practice'-wise, sending down multiple FULL datasets to multiple clients multiple times a second is certainly not it.
Sending only the data relevant to each client is a much better solution, and if you can further reduce how much the client even needs to send (i.e. only their filter updates and not have them re-send something you could already have saved) is an added bonus.
Edit:
Ah, stateless server - though it's not really stateless. You're using web sockets, so the server has some kind of state for those connections. It's already stateful so option 1 doesn't really break anything.
If it's to be completely stateless, then you also can't store the updated rows of data, so you can't return those individually. You're back to what you're doing which is a full round-trip and data read + serve.
Option 3, though, if you're semi stateless (don't want to add any metadata to those socket connections) but do hold updated rows: timestamp them and have the clients send the time of their last update along with their filters - you can then return only the updated rows since their last update using their provided filters (timestamp becomes just another filter) (or maybe it is stateless, but the timestamp becomes another filter).
Either way, limiting the updated data back down to the client is the main goal if for nothing else than saving data transfer.
Edit 2:
Sounds like you may need to send two bits of data down (or three if you want to split things even further - makes life easier client-side, I guess):
{
newItems: [{...}, ...],
updatedItems: [{...}, ...],
deletedIds: [1,2...]
}
Yes, when their request for an update comes, you'll have to check through your updated items to see if any are deleted and of relevance to the client's filters, but you can send down a minimal list of ids rather than whole rows that your client can then remove.

Check If your Rest API is called or not in certain time interval

Consider there are 2 Rest APIs implemented using Java and spring .
One to raise a request(https://example.com/some-service/Requests)(1st API)(our application consumes),It does some processing at the back end but it does not return me the actual result and instead returns me success response.
The needed response to my request takes some time for example 13 mins and this actual response is sent through another API which our application exposes, for example(<ourApplication.com/notifyRaisedRequests>)(2nd API).
I want to write the code in java such that response from 2nd API should not take more than 13 mins after I raise request using 1st API.If more than 13 mins execute failure part and if less than 13 mins execute success part of code
How can this be achieved using Spring boot,Java.There should be someway to keep checking in spring boot that whether 2nd API was called or not within the time interval(13mins). Any help is appreciated.

The code which raises the request:
Response r = post("/Requests");
db.insert(r.getUniqueId(), now);
The code which receives the notification:
void listener(NotificationRequest r) {
db.delete(r.getUniqueId());
// do the success action
}
The code which runs periodically checking the outstanding requests:
for (DbRecord r: db.selectAllRecords()) {
if (r.time - now > 15 minutes) {
db.delete(r.id);
// do the failure action
}
}
This is all pseudocode obviously.

This isn't really specific to Java, it sounds more like an architectural concern.
A user or a process requests resource X by calling the first api
An "OK" (200) response is returned
A user or a process requests resource Y (notify) by calling the second api
If less than Z amount of time has passed between the call to Y and the call to X then send response A; otherwise send response B
One way to tackle this, as suggested in the comments, is to use a database to keep a unique record with a timestamp for each request to the fist api (X). When the second api (Y) is called, it's straghtforward: get the difference in time and act accordingly.
As for generating a unique identifier for each request to the first api, depending on where the record is stored you could do different things but my suggestion would be to keep it really simple and use a random uuid: UUID.randomUUID().
This implies that the response the first api sends back to whatever requested it will have that same uuid in it somewhere since this is necessary for the second api.
For example:
request to api X is received by a spring rest controller (or similar). Somewhere in there (i.e. service layer called by the controller) a unique identifier will be created and a record will be saved in the database. This unique identifier must be returned in the response.
request to api Y is made with the uuid returned in the X's response. This can then be used to recover more information that was saved in the database for that request (such as the timestamp).
EDIT
As per more indications in the comments, three components are needed: a rest controller, a service for data storage / retrieval and a scheduler.
A call is made to first api via controller which uses the service to store information specific to that request. The controller's response returns the uuid to the user for reference.
A scheduler uses that same service to periodically check if the second call has been made or not and decides what to do based on the amount of time lapsed.
If a second call is made, it also uses the same service to save the relevant data and/or delete old records as necessary.

I would suggest you have to analyze your code and API calls first. Exactly where it is consuming more time like while fetching the data from DB(In this case you need to optimize your DB query)or processing the records from client/DB or etc..
You can use any monitoring tool present in the market, for the first analysis.You will get a idea of your API calls, method execution time, thread dump etc...
This link gives some tool names.
https://geekflare.com/api-monitoring-tools/
If you are using spring, then use the spring build library to get the metrics of the application.
Use the below link for more reference::
https://docs.spring.io/spring-boot/docs/current/reference/html/production-ready-features.html

Retrieve data and wait time to check if something new appeared in database

I would like to retrieve data from my in-memory H2 database via rest endpoint using Spring and Java8. I have 2 endpoints: one to retrieve data, and second one to add data to database.
How can I achieve something like it is described below in easiest way? I am not sure what solution can be better I thought about JMS Queue or CompletableFuture (if it is possible). It should work for few users, they will call to retrieve data saved under their id number.
Scenario:
User calls rest-endpoint to retrieve data.
If data is present in database then it is retrieved and returned to user.
If data is not present in database then connection is hold for 60 seconds and if during that time something will appear in database (added via endpoint to add new data) then data will be returned.
If data is not present in database and new data won’t appear in 60 seconds then endpoint returns no content.

There were multiple ways of doing that and if requirements is clear then i suggest below two approaches.
Approach 1:
Find and retrieve if data available without waiting.
If data not available set resource id and retrieveTime in header and respond to consumer.
Based resource id you can be ready with data if available.
In this way you can sure about your endpoint service time always consistent and ideally it shouldn't be more than 3 seconds.
Approach 2.
if data not available then out sleep in 60 seconds (not in database connection scope) and then again try with same thread.
Don't need any queue or ansyc process here.
Here your loosing resources and service time would take more.
Apart from other approaches, if your systems using eventing then use eventing approach when there is record persistent then send event to consumer (all database has the feature to send event to source system today).

HTTP.GET operation with huge list of parameters Spring Rest

I am trying to build a Spring REST Read operation using spring boot. Typically for all read only operations preference should be HTTP GET only.. (at least as far as I know)
Scenario: Client will be sending a list of UUID(assume it as employeeID) values to read employee data. Here Client has a provision to select a bunch of employees and read the data.
Once request is received I need to iterate through those IDs and invoke an existing third party service which will give me the employee data.
Once all UUIDs are processed a report will be generated for all those selected employees.
List of items I would like to hear from you all is..
How to achieve GET operation here when incoming IDs are more than HTTP GET URI limit. Because if the IDs are 100 then the URI is going to reach the limit.
Please request to not suggest for HTTP POST because of few limitations in the requirement.
Any references for handling this scenario asynchronously is much appreciated.
If you suggest to store the IDs first into a table and process them later.. Sorry this is not something what I am looking for. Because client need this data in less than 10 seconds.. (approx)

How to achieve GET operation here when incoming IDs are more than HTTP GET URI limit. Because if the IDs are 100 then the URI is going to reach the limit
Instead of sending these IDs in URI, add these IDs in request body send with GET request.
HTTP GET with request body

You can totally send the UUID's as a request body with GET call. It works just fine.

Ok you are very restricted but I can see that there are two ways to face it, group them or send them by parts then my suggestions are:
I read number 4 but you can improve your requests and time execution sending async requests, then you can send a segment with a ID and total of UUID's to get all information in a short time in server, then you could process it.
Make segments of UUID's to identify them by groups and not individually, then your UUID's will be few.
I don't know if you can get a "selected event" with a check box to send a request for every event, when user sends "generate report event" then you has all data in server.

How do you robustly implement a REST service that retrieves DB records then purges them before returning?

Scenario Imagine a REST service that returns a list of things (e.g. notifications)
Usage A client will continually poll the REST service. The REST service retrieves records from the database. If records are available, they are converted into JSON and returned to the client. And at the same time, the retrieved records are purged from the DB.
Problem How do you handle the problem if the REST endpoints encounters a problem writing the results back to the client ? By that time, the records have been deleted.

Deleting the records will always be a dangerous proposition. What you could do instead is include a timestamp column on the data. Then have your REST url include a "new since" timestamp. You return all records from that timestamp on.
If the notifications grow to be too large you can always setup an automated task to purge records more than an hour old - or whatever interval works well for you.

It sounds like a strange idea to delete DB records after reading access. Possible problems immediately leap into mind: Network trouble prevent the client reading the data, multiple clients cause each other to see incomplete lists, et.al.
The RESTful apporach might be like this:
Give each notification a specific URI. Allow GET and DELETE on these URIs. The client may trigger the record deletion once it successfully received and processed the notification.
Provide an URI to the collection of current notifications. Serve a list of notification data (ID, URI, timestamp, (optional:) content) upon GET request. Take a look at the Atom protocol for ideas. Optional: Allow POST to add a new notification.
With this approach all reading requests stay simple GETs. You may instrument the usual HTTP caching mechanisms on proxies and clients for performance improvement.
Anyway: Deleting a DB entry is a state change on the server. You must not do this upon a GET request. So POST will be you primary choice. But this does not help you much, since the communication might still not be reliable. And polling qith POSTs smells a lot more like Web-Services than REST.

This could be a possible solution.
The REST service can query the database for a list of notifications.
It marks each item in the list ,by say setting a flag in the database.
It then delivers all this notification records to the client.
If the server has successfully sent the results to the client, all marked records are deleted.
In case of failure, the marked records are unmarked,so that they get delivered during the subsequent polling interval.
I hope you got my point.

We did this with special timestamp paramter.
Requests
Request with timestamp.min, this return all items, and server timestamp;
Request with timestamp from server, return items from hat time stamp, and delete prevoius, return server time stamp;
Please note that we did all this with post. means that virtually we sent command (not query get).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.