I am trying to get all the recent media by tag using this Instagram endpoint. The purpose here is to track all the recent media for tags. I have configured a scheduled task (with Java and Spring) (to execute every hour) that sends requests and gets data. Below is the execution sequence:
Send the GET request to Instagram and previously stored max_tag_id (send with null if there's no previous id)
Iterate through results, extract next_max_tag_id from pagination element and store it in the database against corresponding tag
Send GET request again with new max_tag_id and continue
Stop if next_url in the result is null or number of media returned is less than 20 (configured)
Once the execution finishes, next execution (after let's say an hour) will start with previously stored max_tag_id.
The issue I am seeing is, I never get 'recent' documents in subsequent executions. As per the documentation, passing max_tag_id in the request should return all the media after that id, however it's not happening. I keep getting old media.
If I want to get the recent documents in every execution, do I need to pass null max_id in the first request of every execution? If I do that, will I not get redundant documents in every execution? I tried asking Instagram but haven't got any response. Also, their documentation explains little about pagination. Not sure whether pagination for recent media endpoint works backwards.
If you want most recent don't use max_tag_id, If you use max_tag_id it will return all media dated before that.
You need to get the min_tag_id and store it, in the next hour start by making call with only min_tag_id, if there is pagination.next_url, use that to get next set of 20, until pagination.next_url does not exist.... use the stored min_tag_id to make calls the next hour.
The very first time you make call without max_tag_id or min_tag_id
You can also set the &count=32, to get 32 posts with every API call, (32 is max from my experience)
Related
Consider there are 2 Rest APIs implemented using Java and spring .
One to raise a request(https://example.com/some-service/Requests)(1st API)(our application consumes),It does some processing at the back end but it does not return me the actual result and instead returns me success response.
The needed response to my request takes some time for example 13 mins and this actual response is sent through another API which our application exposes, for example(<ourApplication.com/notifyRaisedRequests>)(2nd API).
I want to write the code in java such that response from 2nd API should not take more than 13 mins after I raise request using 1st API.If more than 13 mins execute failure part and if less than 13 mins execute success part of code
How can this be achieved using Spring boot,Java.There should be someway to keep checking in spring boot that whether 2nd API was called or not within the time interval(13mins). Any help is appreciated.
The code which raises the request:
Response r = post("/Requests");
db.insert(r.getUniqueId(), now);
The code which receives the notification:
void listener(NotificationRequest r) {
db.delete(r.getUniqueId());
// do the success action
}
The code which runs periodically checking the outstanding requests:
for (DbRecord r: db.selectAllRecords()) {
if (r.time - now > 15 minutes) {
db.delete(r.id);
// do the failure action
}
}
This is all pseudocode obviously.
This isn't really specific to Java, it sounds more like an architectural concern.
A user or a process requests resource X by calling the first api
An "OK" (200) response is returned
A user or a process requests resource Y (notify) by calling the second api
If less than Z amount of time has passed between the call to Y and the call to X then send response A; otherwise send response B
One way to tackle this, as suggested in the comments, is to use a database to keep a unique record with a timestamp for each request to the fist api (X). When the second api (Y) is called, it's straghtforward: get the difference in time and act accordingly.
As for generating a unique identifier for each request to the first api, depending on where the record is stored you could do different things but my suggestion would be to keep it really simple and use a random uuid: UUID.randomUUID().
This implies that the response the first api sends back to whatever requested it will have that same uuid in it somewhere since this is necessary for the second api.
For example:
request to api X is received by a spring rest controller (or similar). Somewhere in there (i.e. service layer called by the controller) a unique identifier will be created and a record will be saved in the database. This unique identifier must be returned in the response.
request to api Y is made with the uuid returned in the X's response. This can then be used to recover more information that was saved in the database for that request (such as the timestamp).
EDIT
As per more indications in the comments, three components are needed: a rest controller, a service for data storage / retrieval and a scheduler.
A call is made to first api via controller which uses the service to store information specific to that request. The controller's response returns the uuid to the user for reference.
A scheduler uses that same service to periodically check if the second call has been made or not and decides what to do based on the amount of time lapsed.
If a second call is made, it also uses the same service to save the relevant data and/or delete old records as necessary.
I would suggest you have to analyze your code and API calls first. Exactly where it is consuming more time like while fetching the data from DB(In this case you need to optimize your DB query)or processing the records from client/DB or etc..
You can use any monitoring tool present in the market, for the first analysis.You will get a idea of your API calls, method execution time, thread dump etc...
This link gives some tool names.
https://geekflare.com/api-monitoring-tools/
If you are using spring, then use the spring build library to get the metrics of the application.
Use the below link for more reference::
https://docs.spring.io/spring-boot/docs/current/reference/html/production-ready-features.html
I have a case in which I shouldn't make requests to get the scroll_id - I have to manage it somehow so I can get the URL for next pages offline (I am making GET requests against a certain site that exposes their Elasticsearch instance)
So basically, I have a certain URL containing Elasticsearch query and it returns me only 20 results out of 40(20 per request is the max size). I want to get an URL for the next pages - so given I had the connection to the Internet, I would just get the scroll_id from the first request and use it to make next ones.
But I want to avoid it and see if I can have a helper class that builds scroll ids by itself.
Is it possible?
Thanks in advance.
The scroll_id ties directly to some internal state (i.e. the context of the initial query) managed by ES and which eventually times out after the a given period of time.
Once the period times out, the search context is cleared and the scroll id is not valid anymore. I'm afraid there's no way you can craft a scroll id by hand.
But if the result set contains 40 results ans you can only retrieve 20 at a time, I suggest you simple set from: 20 in your second query and you'll be fine.
I'm looking for an opportunity to split a big request like:
rest/api/2/search?jql=(project in (project1, project2, project3....project10)) AND issuetype = Bug AND (component not in (projectA, projectB) OR component = EMPTY). The result will containe > 500 Bugs -> It's very very slow. I want to get them with different requests (methode to performe the request will be annotated with #Asynchronous) but the jql needs to be the same. I don't want to search separately for project1, project2...project10. Would be nice if someone has an idea to resolve my problem.
Thank you :)
You need to calculate pagination. First get the metadata.
rest/api/2/search?jql=[complete search query]&fields=*none&maxResults=0
you should get something like this:
{"startAt":0,"maxResults":0,"total":100,"issues":[]}
so completely without fields, just pagination metadata.
Than create search URI like this.
rest/api/2/search?jql=[complete search query]&startAt=0&maxResults=10
rest/api/2/search?jql=[complete search query]&startAt=10&maxResults=10
..etc
Beware data should change so you should be prepared that you won't recieve all the data and also pagination metadata if calculation is expensive (exspecially "total") should not be presented. More Paged API
Can you not break into 2 parts? If you are displaying in a web page ( display what you can without performance hit. If its a report then get all objects gradually and show once completed.
Get the count in total for JQL & just get the minimum information needed for step 2 - assume its 900
Use the pagination feature (maxResults=100) make multiple calls.
Work on each request.
If you don't want to run the two requests at once and need paging of bugs by user request, you can:
Make a request with the 'maxResults' property set to how much you need.
On the next request set the 'maxResults' property and the 'startAt' with the same value.
If you need to fetch more data, make new request with the same 'maxResults' but update 'startAt' to be the count of bugs you fetched in the previous requests.
I have been given a API link of the form of a URL and query string. And following is my approach,
Query string format means that a GET request is to be fired.
I also assume that this can be done with the HttpURLConnection in Java
I have some data list that I'm retrieving from db
How would I fire for each data in list? Is a simple for loop not going to be enough for such a sophisticated task?
The API link is a trivial link with query string with data from db to be appended to one at a time.
Would like to hear how you would approach this task and see if my approach lacks somewhere.
You are right in doubting the simple for loop approach. It would be slow. The request is blocking, so you'll be waiting for the result of request 1 before firing request 2. Look into doing this asynchronously, firing multiple requests at once.
It's hard to say more without details on the API. Is it an online web service? Something internal created by another department? If it does not exist, consider asking for a version of that function that can receive multiple parameters at once, instead of having to do tons of tiny calls.
Scenario Imagine a REST service that returns a list of things (e.g. notifications)
Usage A client will continually poll the REST service. The REST service retrieves records from the database. If records are available, they are converted into JSON and returned to the client. And at the same time, the retrieved records are purged from the DB.
Problem How do you handle the problem if the REST endpoints encounters a problem writing the results back to the client ? By that time, the records have been deleted.
Deleting the records will always be a dangerous proposition. What you could do instead is include a timestamp column on the data. Then have your REST url include a "new since" timestamp. You return all records from that timestamp on.
If the notifications grow to be too large you can always setup an automated task to purge records more than an hour old - or whatever interval works well for you.
It sounds like a strange idea to delete DB records after reading access. Possible problems immediately leap into mind: Network trouble prevent the client reading the data, multiple clients cause each other to see incomplete lists, et.al.
The RESTful apporach might be like this:
Give each notification a specific URI. Allow GET and DELETE on these URIs. The client may trigger the record deletion once it successfully received and processed the notification.
Provide an URI to the collection of current notifications. Serve a list of notification data (ID, URI, timestamp, (optional:) content) upon GET request. Take a look at the Atom protocol for ideas. Optional: Allow POST to add a new notification.
With this approach all reading requests stay simple GETs. You may instrument the usual HTTP caching mechanisms on proxies and clients for performance improvement.
Anyway: Deleting a DB entry is a state change on the server. You must not do this upon a GET request. So POST will be you primary choice. But this does not help you much, since the communication might still not be reliable. And polling qith POSTs smells a lot more like Web-Services than REST.
This could be a possible solution.
The REST service can query the database for a list of notifications.
It marks each item in the list ,by say setting a flag in the database.
It then delivers all this notification records to the client.
If the server has successfully sent the results to the client, all marked records are deleted.
In case of failure, the marked records are unmarked,so that they get delivered during the subsequent polling interval.
I hope you got my point.
We did this with special timestamp paramter.
Requests
Request with timestamp.min, this return all items, and server timestamp;
Request with timestamp from server, return items from hat time stamp, and delete prevoius, return server time stamp;
Please note that we did all this with post. means that virtually we sent command (not query get).