I'm writing an HTTP client, which needs to parse the response from a webserver, and I have run into (another) problem.
I found that for one page I was redirected to their mobile content portal:
example: www.example.com/m/public. This is not what I want.
When using a "normal" browser, this redirect did not take place.
After looking into the capture I made, I found that this could be because my user-agent is interpreted as that of a mobile handset browser (user agent was "Java/1.6.0_22").
So I changed the user agent, using this:
URL url = new URL(endpoint);
URLConnection conn = url.openConnection();
conn.setRequestProperty ( "User-agent", "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Trident/4.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.5.30729; InfoPath.1; .NET CLR 3.0.30618)");
To my surpise it still did not work, and I found that I was still sending user-agent "Java/1.6.0_22".
Then I looked a bit closer at my capture, and I saw that after a couple of GET requests (after the first GET I send GETs to the sources on the main page) the user-agent magically changed from java to "Mozilla...".
It seems my setRequestProperty does not become active until after a while...
Has anyone seen this? Any way to get around it?
Thanks!
This SO answer suggests setting the system property before-hand.
I had the same problem. I wrote a web crawler and web pages grabbed were mobile versions. Now I used both
System.setProperty("http.agent", "");
urlconn.setRequestProperty("User-Agent", "IE/9.0");
and it worked.
Related
I am trying to download an image from Java code. My code is already working fine for tons of other images, but this one refuses to download.
I'm sure the image exists and I am able to view it inside the browser: http://lemonde-emploi.blog.lemonde.fr/files/2017/02/La-Ru%C3%A9e-des-licornes-Hazard.jpg
I'm using Play framework WS Scala client to download the image. It's just a wrapper around Java famous AsyncHttpClient with a Netty implementation.
I'm running the following code, which work fine for many other images, but fails just for this one:
WS
.url(url)
.withQueryString(queryString: _*)
.withHeaders("User-Agent" -> "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36")
.get
I've set exactly the same User-Agent as my local browser which succeed to access the image.
Here's the server response in debug: it returns a 400 status code.
Any idea why it happens?
The legacy "com.ning" package in your image shows that you're still running AHC1, that has reached end-of-life and I no longer maintain.
You're most likely hitting an old AHC bug related to URL encoding that's been long fixed (I checked that modern releases of AHC2 work fine with your URL).
Basically, it's time to consider upgrading Play/AHC versions.
I am setting user-agent to test a iOS app from my Java client this way -
urlc.setRequestProperty("user-agent", "Mozilla/5.0 (Windows NT 5.1; rv:19.0) Gecko/20100101 Firefox/19.0");
However, in the JSON response I am getting an error that this app can only be tested on an iOS device (which is a custom response I have when the app is tested from a non-IOS device. So what is the correct way to set user-agent in Java?
A browser sends a special string, called a user agent, to websites to identify itself. The web server, or JavaScript in the downloaded webpage, detects the client’s identity and can modify its behavior accordingly. In the simplest case, the user agent string includes an application name—for example, Navigator as the application name and 6.0 as the version. Safari on the desktop and Safari on iOS have their own user agent strings, too.
https://developer.apple.com/library/ios/documentation/AppleApplications/Reference/SafariWebContent/OptimizingforSafarioniPhone/OptimizingforSafarioniPhone.html
UserAgent complete set
UserAgent complete set ssfari
I am trying to load a mobile version of the web page using a java program for extracting few information from the web page, easily.
In Firefox, after adding the user agent switcher plugin, i have added a new user agent with the value
"Mozilla/5.0 (SymbianOS/9.2; U; Series60/3.1 NokiaE71-1/110.07.127; Profile/MIDP-2.0 Configuration/CLDC-1.1 ) AppleWebKit/413 (KHTML, like Gecko) Safari/413"
After this, if i try to load http://www.bbc.co.uk/, the mobile version of the web page is loaded successfully.
But i am trying to do the same with a java program using apache httpclient library by setting the User-Agent as given below:
HttpClient httpclient = new DefaultHttpClient();
HttpProtocolParams.setUserAgent(httpclient.getParams(),
"Mozilla/5.0 (SymbianOS/9.2; U; Series60/3.1 NokiaE71-1/110.07.127; Profile/MIDP-2.0 Configuration/CLDC-1.1 ) AppleWebKit/413 (KHTML, like Gecko) Safari/413");
But i am not getting the mobile version of the the same link.
I hope the redirect will happen automatically here and i will be getting the mobile version of the page as the user-agent is modified.
Can you please help me to resolve this issue?
HttpClient does not support JavaScript redirection.
Please note that HttpClient is not a browser. Importantly it lacks UI, cache, HTML renderer and a JavaScript engine. To learn more about the scope of HttpClient please refer to HttpClient Primer
Maybe you can try solutions proposed in these questions
httpclient + javascript
Apache HttpClient 4 And JavaScript
JavaScript Context in HttpClient
Had you setFollowRedirects on HttpClient?
I'm using the UserAgentUtils Java library to extract user agent details from the user agent string of browsers during a PDI transform, but no matter what I do I always get back a null version from the library after parsing the user agent string, even when I can clearly see the version in the string. For example:
String userAgentString = "Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US) AppleWebKit/533.3 (KHTML, like Gecko) capybara-webkit Safari/533.3"
UserAgent userAgent = new UserAgent(userAgentString)
userAgent.getBrowserVersion() //always comes back null
Two questions. What am I not doing right to get back the data from UserAgentUtils (it doesn't seem to be a bug because there's no history of issues related to this in their bug tracking system)?
Alternatively, is there another Java or JavaScript library I could use to extract the component information from user agent strings? Either one is okay, since I can equally easily use either in the PDI job where this code lives.
are you trying to set the http agent value for jetty http client requests?
i do this on my user defined java class:
import java.lang.System.*;
...
System.setProperty("http.agent", "my cool crawler, mycoolcrawler#example.com");
now all your http requests from kettle will send user agent header with this info
I have fields like:
"GET /?blahblahblah HTTP/1.1" 200 43 "http://www.thesun.co.uk/sol/homepage/" 1 blahblah - "en-gb" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; GTB0.0; FunWebProducts; .NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"
i'm looking for a java library or code that can decode this, parse it apart, and provide programatic access to the components, especially the user agent info. googling didnt turn up anything useful, but as this info is used all the time, there must be existing systems for doing what i require.
You probably want to use Apache HttpCore. The interface you're looking for it HttpRequest and a simple implementation is BasicHttpRequest.
Constructing the HttpRequest is dependent on how you get the request itself to begin with, but for example, in a little web server I'm working on, it's simply:
DefaultHttpServerConnection serverConnection = new DefaultHttpServerConnection();
serverConnection.bind(socket, params);
HttpRequest httpRequest = serverConnection.receiveRequestHeader();
What you are looking for is an apache LOG parser. You may find JXLA useful, or google for "Java apache log parser".