I have a small application in java which searches images using bing image search. The problem I am facing is that, its getting only first 20 images. May be because when we search on bing.com it populates first 20 images first and then its an infinite scrolling feature.
Is there any way to search more than 20 images using bing?
Cheers :)
I'm guessing this is because this site uses ajax to populate the "infinite" scrolling list as you call it.
You probably send an http request and get the initial page (btw on my browser I got 6 images accross x 4 down, i.e. 24 not 20; thinking about it maybe my client also got 20 only at first and got the last 4 w/ ajax...), and you'd need to do the paging trough by way of ajax requests.
At a glance, the xhtml and associated javascript of the page is very dense and somewhat obfuscated, It would take a while to get oriented... An alternative to analyzing this page is to instead use a packet sniffer (such as wireshark) and to capture the requests which take place when you scroll down.
Essentially this will likely expose some form of ajax request, which you can then easily emulate with java. Typically the ajax response is easy to parse whatever its nature (xml, jason, gzip...).
A possible snags to this well laid out plan is if the returned data in the ajax response is encrypted, for example where the extra images are bundled in some sort of envelope for which you'll then need to discover the format.
Depending on the actual task at hand, you may try alternatives such as automations within GreaseMonkey (on Firefox) or similar tools.
What of Bing API ?
Note that all the above approaches are akin to screen-scraping and hence quite sensitive to even minute changes in the Bing application, and, depending on effective usage and context, this could put the project in a legal grey area... A better approach may be to register and obtain a proper application ID with MS/Bing and to use the Bing API.
You are simulating a browser? Doesn't the Bing engine have an entry point for programs instead - a web service or so - which would make your task much easier.
EDIT: SDK appears to be here: http://msdn.microsoft.com/en-us/library/cc980922.aspx
Just wanted to post a direct answer to the question:
Bing uses Ajax (of course) for the infinite scroll. Each "tick" is based on a simple ajax get request, which accuires new images.
For instance, this url returns 30 results (121-151) in a "htmlraw" format based on the query "max payne".
http://www.bing.com/images/async?q=max+payne&format=htmlraw&first=121
Edit:
It works with the original url too, just add &first=NUMBER to the querystring. Example:
www.bing.com/images/search?q=payne&go=&form=QBLH&scope=images&filt=all&first=10
I am building my own bulk image collector (for a "learning project" for myself) and I found out that it is paginated like this.
FYI, Google and Bing are easy, Yahoo and Altavista (redundant, since their results are from Yahoo) are far more problematic - they don't post the directlink to the original image.
Have fun! :)
This can be done by using count parameter. For example, I tried GET "https://api.cognitive.microsoft.com/bing/v7.0/images/search?q=shoes&mkt=en-us&count=30" call and it returns 30 images.
Related
I am interfacing with Shopify and they use RESTful API. When I request a resource that returns an array of items, they use RFC8288 pagination format.
For example, https://example.com/api/inventory_levels.json?limit=10 returns 10 entities along with the following response header:
Link: <https://example.com/api/inventory_levels.json?limit=10&page_info=eyJs9pZHMiO>;
rel="previous", <https://example.com/api/inventory_levels.json?limit=10&page_info=MiZHeyJs9pO>; rel="next"
Appearantly if I want to retrieve all entites from that resource I need to iterate through the 'next' URL until there's no more 'next' returning. But how am I going to parse these info using JAVA or C# code? I could use a regular expression like <(?<next_url>.*)>; rel="next" to retrieve the 'next_url' from it. But it feels like re-inventing the wheel and not robust.
If this is a well-defined feature, shouldn't there be a readily available library/infrastructure that could be used? I just don't want to be caught by surprise if one day the formatting shows up different (like having an extra space and such) and, whilst abiding to the RFC defination, breaks my hastily scrambled up RegEx solution.
Suggestion welcome for Java or C#.
I`m using Android Google Places API to autocomplete streets and addresses. The problem is that it gives all streets from a whole country. Of course I added bounds to limit place for search, but it doesnt work correctly - it gives only priority, so in other words best results will be higher in list, nothing more
So code:
AutocompleteFilter typeFilter = new AutocompleteFilter.Builder()
.setTypeFilter(AutocompleteFilter.TYPE_FILTER_ADDRESS)
.setCountry("RU")
.build();
Intent intent =
new PlaceAutocomplete.IntentBuilder(PlaceAutocomplete.MODE_OVERLAY)
.zzih(searchString) //that is for passing search string from toolbar
.setFilter(typeFilter)
.setBoundsBias(city.getBounds())
.build(this);
In short the problem is:
When I type in search something like "Lenina Street" I see a lot of useless results out of bounds set in .setBoundsBias(city.getBounds()). Just imagine that something like "Lenina Street" exists in almost every locality!
How can I fix the problem and limit search results?
P.S.
I know I can use Google Places Web API or by GeoDataApi.getAutocompletePredictions() and filter results manually,
but that means I have to write UI manually too, what I dont want to
do.
Thats even worse than I thought. Even if I get results from Web API or through GeoDataApi I have only predictions which doesnt contain coordinates, only placeId. So if I want to filter predictions by coordinates I have to do request for each placeId. In other words if I got 20 places I will have to do 20 more requests to find out coordinates.
Also I can add city name in searchString, that makes results better (but not at all) but it makes writing of address unclear and city name takes place, so its not good solution too.
I'm afraid Places API for Android doesn't support strict bounds yet. There is a feature request in Google Issue tracker to implement this:
https://issuetracker.google.com/issues/38188994
Feel free to star this feature request to add your vote and subscribe to notifications from Google.
In the meantime the workaround might be using Places API web service that supports strict bounds and implement the UI manually.
UPDATE
The feature request was marked as Fixed by Google. Have a look at https://stackoverflow.com/a/50134855/5140781 that shows how to apply strict bounds in Places API for Android.
I'm getting pretty familiar with using Asyn tasks to fetch data from API endpoints now. I can easily hit a url and parse the JSON data that returns.
However I've run into a problem in which this API has a lot of pages to it.
What's the best way to deal with an API that has a lot of pages, and has no option to change the results per page?
My particular endpoint has 40+ pages of data (12 results per page). I feel as if spinning up a new async task per each page endpoint is a bit ridiculous.
Any ideas?
Unfortunatly as everyone suggest there is no way around the api if it does not support a results per page argument. You could prefetch one or two pages and join them in one AsyncTask that way you minimize the amount of async task that fork from the main thread and have a strategy when you need to load more pages.
I would definitely suggest you, use retrofit HTTP client. I had the same issue almost 260+ calls and Retrofit work fine for me.Check it here
I want to read a JSON list from a webservice with Java. The webservice returns a list of authors from luxemburg, e.g. sorted by the year. That's the web-site:
http://www.autorenlexikon.lu/page/periods/1919-1945/1/1/DEU/index.html
So far, I know that I can receive a JSON document with a request like this:
http://www.autorenlexikon.lu/mmp/json.document_list/DEU/0?search_since=1919&search_until=1945
But I only get the first 20 entries. How can I get the next 20 entries? I think the solution is in the JavaScript-code of the web-site, but I am pretty new in JavaScript (also in JSON).
EDIT:
There isn't any official API.
I have already tried:
http://www.autorenlexikon.lu/mmp/json.document_list/DEU/0?pageSize=1000&search_since=1919&search_until=1945
http://www.autorenlexikon.lu/mmp/json.document_list/DEU/0?page_Size=1000&search_since=1919&search_until=1945
...and many more. Who does the JavaScript-code receive all entries? Couldn't I copy this mechanism?
You should check their API and look for a parameter that let's you define the page or the range of results you want to get.
Edit Seems like you'd have to make a POST request and add the start index as well as the page size as post parameters. For more information see #matthijs koevoets' answer.
It depends on how the Webservice has been coded. Nothing to do with JSON specifically. From the results you can see it says
"pageSize":20,
You just have to figure out how to call the Web service with a page size. It may not allow you to query it with a different page size. That's up to the Web service API coded by their developers
their service seems to accept POST parameters only: sort=year&dir=asc&startIndex=0&results=100
I am working with a java Twitter app (using Twitter4J api). I have created the app and can view the current users timeline, user's profiles, etc..
However, when using the app it seems to quite quickly exceed the 150 requests an hour rate limit set on Twitter clients (i know developers can increase this to 350 on given accounts, but that would not resolve for other users).
Surely this is not affecting all clients, any ideas as to how to get around this?
Does anyone know what counts as a request? For example, when i view a user's profile, i load the User object (twitter4j) and then get the screenname, username, user description, user status, etc to put into a JSON object - would this be a single call to get the object or would it several to include all the user.get... calls?
Thanks in advance
You really do need to keep track what your current request count is when dealing with Twitter.
However, twitter does not seem to drop the count for 304 Not Modified (at least it didn't the last time I dealt with it), so make sure there isn't something breaking your normal use of HTTP caching, and your practical request per hour goes up.
Note that twitter suffers from a bug in mod_gzip on apache where the e-tag is mal-formed in changing it to reflect that the content-encoding is different to that of the non-gzipped entity (this is the Right Thing to do, there's just a bug in the implementation). Because of this, accepting gzipped content from twitter means it'll never send a 304, which increases your request count, and in many cases undermines the efficiency gains of using gzip.
Hence, if you are accepting gzip (your web-library may do so by default, see what you can see with a tool like Fiddler, I'm a .NET guy with only a little Java knowledge, answering at the level of how twitter deals with HTTP so I don't know the details of Java web libraries), try turning that off, and see if it improve things.
Almost every type of read from Twitter's servers (i.e. anything that calls HTTP GET) counts as a request. Getting user timelines, retweets, direct messages, getting user data all count as 1 request each. Pretty much the only Twitter API call that reads from the server without counting against your API limit is checking to see the rate limit status.