Split a big Jira-Rest-Request - java

I'm looking for an opportunity to split a big request like:
rest/api/2/search?jql=(project in (project1, project2, project3....project10)) AND issuetype = Bug AND (component not in (projectA, projectB) OR component = EMPTY). The result will containe > 500 Bugs -> It's very very slow. I want to get them with different requests (methode to performe the request will be annotated with #Asynchronous) but the jql needs to be the same. I don't want to search separately for project1, project2...project10. Would be nice if someone has an idea to resolve my problem.
Thank you :)

You need to calculate pagination. First get the metadata.
rest/api/2/search?jql=[complete search query]&fields=*none&maxResults=0
you should get something like this:
{"startAt":0,"maxResults":0,"total":100,"issues":[]}
so completely without fields, just pagination metadata.
Than create search URI like this.
rest/api/2/search?jql=[complete search query]&startAt=0&maxResults=10
rest/api/2/search?jql=[complete search query]&startAt=10&maxResults=10
..etc
Beware data should change so you should be prepared that you won't recieve all the data and also pagination metadata if calculation is expensive (exspecially "total") should not be presented. More Paged API

Can you not break into 2 parts? If you are displaying in a web page ( display what you can without performance hit. If its a report then get all objects gradually and show once completed.
Get the count in total for JQL & just get the minimum information needed for step 2 - assume its 900
Use the pagination feature (maxResults=100) make multiple calls.
Work on each request.

If you don't want to run the two requests at once and need paging of bugs by user request, you can:
Make a request with the 'maxResults' property set to how much you need.
On the next request set the 'maxResults' property and the 'startAt' with the same value.
If you need to fetch more data, make new request with the same 'maxResults' but update 'startAt' to be the count of bugs you fetched in the previous requests.

Related

Filters logic should be on frontend or backend?

I’m creating a web application
Frontend - reactjs and backend java.
Frontend and backend communicate with each other via rest.
On UI I show a list of items. And I need to filter them for some params.
Option 1: filter logic is on front end
In this case I just need to make a get call to backend and get all items.
After user choose some filter option filtering is happening on ui.
Pros: for that I don’t need to send data to back end and wait for response. Speed of refreshing the list should be faster.
Cons: If I will need multiple frontend clients. Let’s say a mobile app. Than I need to create filters again on this app too.
Option 2: filter logic is on back end
In this case I get all list items when app is loading. After user changes the filter options I need to send a get request with filters params and wait for response.
After that update a list of items on UI.
Pros: filter logic is written only once.
Cons: Speed probably will be much slower. Because it takes time to send request and get a result back.
Question: Where the filter logic should be? In frontend or in backend? Or maybe what is a best practice?
Filter and limit on the back end. If you had a million records, and a hundred thousand users trying to access those records at the same time, would you really want to send a million records to EVERY user? It'd kill your server and user experience (waiting for a million records to propagate from the back end for every user AND then propagate on the front end would take ages when compared to just getting 20-100 records and then clicking a (pagination) button to retrieve the next 20-100). On top of that, then to filter a million records on the front-end would, again, take a very long time and ultimately not be very practical.
From a real world stand point, most websites have some sort of record limit: Ebay = 50-200 records, Amazon = ~20, Target = ~20... etc. This ensures quick server responses and a smooth user experience for every user.
This depends on the size of your data.
For eg: If you are having a large amount of data, it is better to implement the filter logic on the backend and let the db perform the operations.
In case, you have less amount of data, you can do the filter logic on the front end after getting the data.
Let us understand this by an example.
Suppose you have an entity having 1,00,000 records and you want to show it in a grid.
In this case it is better to get 10 records on every call and show it in a grid.
If you want to perform any filter operation on this, it is better to make a query for the db on the backend and get the results
In case it you have just 1000 records in your entity, it will be beneficial to get all the data and do all the filter operations on the frontend.
Most likely begin with the frontend (unless you're dealing with huge amounts of data):
Implement filtering on the frontend (unless for some reason it's easier to do it on the backend, which I find unlikely).
Iterate until filtering functionality is somewhat stable.
Analyze your traffic, see if it makes sense to put the effort into implementing backend filtering. See what percentage of requests are actually filtered, and what savings you'd be getting from backend filtering.
Implement (or not) backend filtering depending on the results of #3.
As a personal note, the accepted answer is terrible advice:
"If you had a million records, and a hundred thousand users trying to access those records at the same time"; nothing is forcing the hundred thousand users to use filtering, your system should be able to handle that doomsday scenario. Backend filtering should be just an optimization, not a solution.
once you do filtering on the backend you'll probably want to do pagination as well; this is not a trivial feature if you want consistent results.
doing backend filtering is likely to become much more complex than just frontend filtering; you should be aware that you're going to spend a significant amount of time (not only for the initial implementation but also for ongoing maintenance) and ask yourself if it's not premature optimization.
TL/DR: Do wherever is easier for you and don't worry about it until it makes sense to start optimizing.
It depends on the specific requirements of your application, but in my opinion the safer bet would be the back-end.
Considering you need filtering in the first place, I assume you have enough data so that paging through it is required. In this case, you need to have the filtering on the back-end.
Lets say you have a page size of 20. After you apply the filter you would expect to have a page of 20 entities that match that specific filtering criteria in the UI. This can't be achieved if you fetch 20 entities, store them in the front-end and afterwards apply the filter on them.
Also, if you have enough data, fetching all of it in the front-end will be impossible due to memory constraints.

Elasticsearch - build scroll id manually

I have a case in which I shouldn't make requests to get the scroll_id - I have to manage it somehow so I can get the URL for next pages offline (I am making GET requests against a certain site that exposes their Elasticsearch instance)
So basically, I have a certain URL containing Elasticsearch query and it returns me only 20 results out of 40(20 per request is the max size). I want to get an URL for the next pages - so given I had the connection to the Internet, I would just get the scroll_id from the first request and use it to make next ones.
But I want to avoid it and see if I can have a helper class that builds scroll ids by itself.
Is it possible?
Thanks in advance.
The scroll_id ties directly to some internal state (i.e. the context of the initial query) managed by ES and which eventually times out after the a given period of time.
Once the period times out, the search context is cleared and the scroll id is not valid anymore. I'm afraid there's no way you can craft a scroll id by hand.
But if the result set contains 40 results ans you can only retrieve 20 at a time, I suggest you simple set from: 20 in your second query and you'll be fine.

Query past the 500 limit in Gerrit REST API

I'm trying to get 2000 change results from a specific branch with a query request using Gerrit REST API in Java. The problem is that I'm only getting 500 results no matter what I add to the query search.
I have tried the options listed here but I'm not getting the 2000 results that I need. I also read that an admin can increase this limit but would prefer a method that doesn't require this detour.
So what I'm wondering is:
Is it possible to increase the limit without the need to contact the admin?
If not. Is it possible to continue/repeat the query in order to get the remaining 1500 results that I want, using a loop that performs the query on the following 500 results from the previous query until I finally get 2000 results in total?
When using the list changes REST API, the results are returned as a list of ChangeInfo Elements. If there are more results than were returned, the last entry in that list will have a _more_changes field with value true. You can then query again and set the start option to skip over the ones that you've already received.
I want to add a minor workaround to David's great answer.
If you want to crawl Gerrit instances hosted on Google servers (such as Android, Chromium, Golang), you will notice that they block queries with more than 10000 results. You can check this e.g. with
curl "https://android-review.googlesource.com/changes/?q=status:closed&S=10000"
I solved the problem in such a way, that I split up these list of changes with before: and until: in a query string, for example lie
_url_/changes/?q=after:{2018-01-01 00:00:00.000} AND before:{2018-01-01 00:59:99.999}
_url_/changes/?q=after:{2018-01-01 01:00:00.000} AND before:{2018-01-01 01:59:99.999}
_url_/changes/?q=after:{2018-01-01 02:00:00.000} AND before:{2018-01-01 02:59:99.999}
and so on. I think you get the idea. ;-) Please notice, that both limits (before: and after:) are inclusive! For each day I use the pagination described by David.
A nice side effect is, that you can track the progress of the crawling.
I wrote a small Python tool named "Gerry" to crawl open source instances. Feel free to use, adopt it and send me pull requests!
I almost had the same problem. But there is no way as you mentioned you don't want admin to increase the query limit and also you don't want to fire the rest query in a loop with the counter. I will suggest you to follow the second approach firing the query in a loop with a counter set. That's the way I have implemented the rest client in Java.

Is there a easy way to get Nth page of items from DynamoDB by java?

I am now working on a web app associated with Amazon DynamoDB,
I want to achieve a function that my users can directly get to the Nth page to view the item info,
I have been told that the pagination in DynamoDB is based on last key, rather than limit/offset.It doesn't natively support offset.DynamoDB Scan / Query Pagination
Does that mean : If I want to get to the 10th page of items, then I have to query the 9 pages ahead first?(which seems reeeeeally not a good solution)
Is there a easier way to do that?
You are right. DynamoDB doesn't support numerical offset. The only way to paginate is to use the LastEvaluatedKey parameter when making a request. You still have some good options to achieve pagination using a number.
Fast Cursor
You can make fast pagination requests by discarding the full result and getting only the Keys. You are limited to 1MB per request. This represents a large amount of Keys! Using this, you can move your cursor to the required position and start reading full objects.
This solution is acceptable for small/medium datasets. You will run into performance and cost issues on large datasets.
Numerical index
You can also create a global secondary index where you will paginate your dataset. You can add for example an offset property to all your objects. You can query this global index directly to get the desired page.
Obviously this only works if you don't use any custom filter... And you have to maintain this value when inserting/deleting/updating objects. So this solution is only good if you have an 'append only' dataset
Cached Cursor
This solution is built on the first one. But instead of fetching keys every single time, you can cache the pages positions and reuse them for other requests. Cache tools like redis or memcached can help you to achieve that.
You check the cache to see if pages are already calculated
If not, you scan your dataset getting only Keys. Then you store the starting Key of each page in your cache.
You request the desired page to fetch full objects
Choose the solution that fits your needs. I hope this will help you :)

Get all the information from the server or do it in stages?

I want to get the name and picture of a Cub Scout, the Cub Scout details, the awards and details for each award and display these details in a view. Is it best to get each set of details from the server side, pass it back to the client side and display it or get all the information at once?
I would opt for option 2. However, I thought I had better check.
Regards,
Glyn
Unless your query is returning some huge dataset, then multiple queries probably aren't necessary. If load times are too slow when you implement your queries, consider paging your results.
When showing a view with all records from db, it is usually a good practice to show minimum data for each row with a link to more details for each row for better user experience and less load time from query.
Now when user click on more details, other detail data of that particular row should be fetched.
I agree that if the size of the information is not that big, probably is a good idea to get it all at once. All depends on your UX and you backend design.
Are you expecting to show the info according to user interactions?,
Are you going to reuse part of that information somewhere else in your app?.
Are you caching query results?
Is you do.. then you might consider to have a few methods to get the information. Regarding your page controller, unless you are not using ajax, you may create a facade (this pattern ) to integrate all those partial methods.
Let me know if you have any questions..
Thansk!,
#leo.
Get the benefits of Asynchronous call load as less data as possible on on-load and make Asynchronous calls when data needed.
Two main reasons to go for Asynchronous :
Reduce the traffic travels between the client and the server.
Response time is faster so increases performance and speed.

Categories

Resources