Http Get Request - what data is actually send? - java

I'm currently building a web spider with java apache commons. I'm crawling basic google search queries like https://google.com/search?q=word&hl=en
Somehow after about 60 queries I get blocked, it seems they recognize me as a bot and I get a 503 Service Unavailable response
Now the important part:
If I visit the same site with firefox/chrome I get the desired result.
If I make a GET Request with my Application using the same http header (user-agent, cookies, cache etc.) I am still blocked.
HOW does Google know whether I'm connecting via Application or Chrome-Browser, when there is only the IP and the HTTP-Header as Information?(maybe I'm wrong?)
Are there more parameters to recognize my App? Something that Google sees and I don't?
(Maybe important: I'm using Chrome Developer Tools and httpbin.org to compare the headers of Browser and Application.)
Thanks a lot

Since you have not specified how quickly you send the 60 queries, I am assuming at a high rate. This is why google is blocking you. Several times I have rapidly done google searches from chrome and it asks for a captcha after a while and then blocks soon after.
Please see the API on Custom Search and this post about terms of Service Replacement for Google API
FAQ on blocked searches: Google FAQ

Related

Java, Client-Side NTLM Authentication

I'm charged with building an app health monitoring system, and one of the requirements is to check both if our various java services are up and if our reverse proxy is up. The reverse proxy is a .NET application that gets a user's AD Groups and passes them along to our apps via a header for monitoring/security reasons.
Obviously the rinky dink input and output stream http request isn't enough, but I haven't found information or guides about client-side NTLM authentication for Java.
I'm guessing there's a library (probably Oracle or Apache) that provides a handler that can do this, but I have come up empty trying to find it. Please help.
Great code and context can be found here, for client side.
http://davenport.sourceforge.net/ntlm.html#appendixD

How to add the data to Google Analytis by using java (programmatically)

At the current moment, I am trying to understand how to add any data to GA. I read the data from my GA account using Core Reporting API and Managment API without any problems. But now I want to add the data (the number of phone calls) to GA account programmatically. Somebody can explain me step by step - how can I do this?
The Measurement Protocol is how we send data to Google Analytics. The JavaScript snippet that we use in our websites also uses the Measurement protocol as do the SDK's for Android and IOS. Unfortunately there is no (official) SDKs for the other languages like Java for instance.
That being said you can technically code it yourself in any language that can handle a HTTP get or a HTTP post. I have personally done it for C#.
POST /collect HTTP/1.1
Host: www.google-analytics.com
payload_data
The following parameters are required for each payload:
v=1 // Version.
&tid=UA-XXXXX-Y // Tracking ID / Property ID.
&cid=555 // Anonymous Client ID.
&t= // Hit Type.
A few tips to get you started.
Check out validating hits this is very useful in the beginning for debugging your requests.
some of the parameters are only valid for certain hit types. Make sure you check the documentation.
Cid is just a string it can be anything most people send a Guid its basically used by the server to identify a unique session.
if you are doing this for an application google analytics account remember to send screenview not pageview the same goes for web application.
check the realtime report on google analytics to see if your hits are getting recorded.
Update for question in comment:
I recommend while you are getting the idea of this you start with just using HTTP GET in a web browser. Its easer to test your requests against debug that way. For example put this in a browser.
https://google-analytics.com/debug/collect?v=1&tid=UA-123456-1&cid=5555&t=pageview&dp=%2FpageA
DP is document path and I am not sure why it is requiring that you send that.
ScreenView hit type VS PageView hit type.
There are two types of Google Analytics accounts ones for applications like android applications or sometimes web applications, and web sites. Application Google analytics accounts are meant to be used with ScreenView hit (the user checks a screen in the application) type and web accounts use PageViews (the user views a webpage). If you send a Pageview to an application Google Analytics account it will accept the hit but there will be no way for you to see the data. If you send a ScreenView to a web Google Analytics account it will again accept the data but you wont see it.

Jsoup, Reddit, OAuth2, and 429 HTTP Errors

So I'm trying to write an executable JAR for a small subreddit I run.
I have a post that Jsoup connects to and reads all the URLs on that page. In another method, it then connects to all those URLs (that are just comments on the post) and gets the HTML from the comments and saves them to a HashMap.
This is great however I am getting a 429 HTTP Error. So to resolve this, I added a short 5 second wait. Now I'm getting a SocketTimeoutException "Read timed out". Once I lowered the time down to 3 seconds, I was bouncing between the two.
Now I run a few Reddit bots with Python and I'm able to make a lot more requests than what I'm doing here. I actually have a single bot that makes thousands of requests every minute. So I know it's possible to make these requests.
My question essentially is, how am I able to make multiple requests to Reddit and avoid the 429 HTTP Error? I'm using Jsoup to connect and read the HTML.
While I'm sure connecting to Reddit via. their OAuth2 API will fix the issues, I have no idea how to actually use OAuth2 in Java (I actually use a wrapper in Python so it's fair to say I don't know at all) and I don't know how to then use that with Jsoup.
My question essentially is, how am I able to make multiple requests to Reddit and avoid the 429 HTTP Error?
You answer this yourself:
While I'm sure connecting to Reddit via. their OAuth2 API will fix the issues,
As specified in the API documentation, you get twice as many requests per second if authenticated using OAuth.
Have you looked around for examples on how to handle OAuth flows in Java?
You might also find it easier to use one of the wrapper libraries for Java, instead of handling all this yourself.
Just set header and you can easily pass it
User-Agent: super happy flair bot by /u/spladug

Android: Getting HTTP 403 Forbidden upon calling https://maps.googleapis.com/maps/api/geocode/xml

I've developed an Android Application since 2012. Since now I've made use of the free web service api v3 for geocoding without an API key so that we have a limit of 2,500 request per IP and not for KEY without problems:
https://maps.googleapis.com/maps/api/geocode/xml
All has worked fine until this period when sometimes some of my users soffers of error 403 - Forbidden during this web server call.
Has Google Changed something for the use of his webservices?
Eventually I could use the Geocoder class of the Android Framework.
But I've still another webservice that i call:
https://maps.googleapis.com/maps/api/directions/xml
Could also this webservice suffer of the 403 error? Up to now I don't know because if I get 403 error in geocode I won't call the direction webservice.
Here's a little information you may want to look over... You can find more about this here.
HTTP 403 response
Requests to the web services may also receive a HTTP 403 (Forbidden)
error. In most cases, this is due to an invalid URL signature. To
verify this, remove the client and signature parameters and try again:
If the response is HTTP 200 (OK), the signature was the problem. This
is not related to usage limits; see Troubleshooting authentication
issues in the Web Services chapter of the Google Maps API for Work
documentation for details. If the response is still a HTTP 403
(Forbidden) error, the signature was not necessarily the problem, it
may be related to usage limits instead. This typically means your
access to the web service has been blocked on the grounds that your
application has been exceeding usage limits for too long or otherwise
abused the web service. Please contact Google Enterprise Support if
you encounter this issue. Requests to all web services require URL
signatures. Requests will also be rejected with a HTTP 403 (Forbidden)
error when including the client parameter but missing the signature
parameter, or vice versa.
Problems
You can exceed the Google Maps API Web Services usage limits by:
Sending too many requests per day. Sending requests too fast, i.e. too
many requests per second. Sending requests too fast for too long or
otherwise abusing the web service. Exceeding other usage limits, e.g.
points per request in the Elevation API.
Solutions
The above problems can be address by combining two approaches:
Lowering usage, by optimizing applications to use the web services
more efficiently. Increasing usage limits, when possible, by
purchasing additional allowance for your Google Maps API for Work
license. This article will focus on ways of optimizing applications to
use the web services more efficiently.
Here's another good link that may just help as well.

How to check DKIM signature of incoming email in Java Google App Engine

I am looking for a way to validate the DKIM signature of the incoming email.
I know how to do it in Java SE but it will use classes like javax.naming.directory.DirContext to get data from DNS server and this class is not white listed in App Engine.
Any idea how to communicate with DNS form Google App engine?
There is a blog post that says:
Once you've configured DKIM, just send
an email from your Google Apps account
to:
dkim#dkim-test.appspotmail.com
within minutes, you should get back an
email that says "PASS" or "FAIL". If
your test passed, you're all set!
I tried it and it works!
It seems like a Google App Engine Application. How is it done?!
I'm the author of the dkim-test app. Unfortunately I did not actually find a native way to do DNS queries in AppEngine. There's a feature request in the AppEngine issues tracker here:
http://code.google.com/p/googleappengine/issues/detail?id=354
The way I got around this for dkim-test was to do a HTTP GET request to http://whatsmyip.us/dns_txt.php?host=google.com (where google.com is the host I want to retrieve TXT records for).
Obviously there are some down sides here. dkim-test is entirely dependent on whatsmyip.us to work, should that service go down or they decide to block dkim-test, then it would break. Things would also break if they changed the format of the response.

Categories

Resources