Simulating Mobile Web Browser from java program - java

I am trying to load a mobile version of the web page using a java program for extracting few information from the web page, easily.
In Firefox, after adding the user agent switcher plugin, i have added a new user agent with the value
"Mozilla/5.0 (SymbianOS/9.2; U; Series60/3.1 NokiaE71-1/110.07.127; Profile/MIDP-2.0 Configuration/CLDC-1.1 ) AppleWebKit/413 (KHTML, like Gecko) Safari/413"
After this, if i try to load http://www.bbc.co.uk/, the mobile version of the web page is loaded successfully.
But i am trying to do the same with a java program using apache httpclient library by setting the User-Agent as given below:
HttpClient httpclient = new DefaultHttpClient();
HttpProtocolParams.setUserAgent(httpclient.getParams(),
"Mozilla/5.0 (SymbianOS/9.2; U; Series60/3.1 NokiaE71-1/110.07.127; Profile/MIDP-2.0 Configuration/CLDC-1.1 ) AppleWebKit/413 (KHTML, like Gecko) Safari/413");
But i am not getting the mobile version of the the same link.
I hope the redirect will happen automatically here and i will be getting the mobile version of the page as the user-agent is modified.
Can you please help me to resolve this issue?

HttpClient does not support JavaScript redirection.
Please note that HttpClient is not a browser. Importantly it lacks UI, cache, HTML renderer and a JavaScript engine. To learn more about the scope of HttpClient please refer to HttpClient Primer
Maybe you can try solutions proposed in these questions
httpclient + javascript
Apache HttpClient 4 And JavaScript
JavaScript Context in HttpClient

Had you setFollowRedirects on HttpClient?

Related

Can't download image (status=400) (not related to SSL or User-Agent)

I am trying to download an image from Java code. My code is already working fine for tons of other images, but this one refuses to download.
I'm sure the image exists and I am able to view it inside the browser: http://lemonde-emploi.blog.lemonde.fr/files/2017/02/La-Ru%C3%A9e-des-licornes-Hazard.jpg
I'm using Play framework WS Scala client to download the image. It's just a wrapper around Java famous AsyncHttpClient with a Netty implementation.
I'm running the following code, which work fine for many other images, but fails just for this one:
WS
.url(url)
.withQueryString(queryString: _*)
.withHeaders("User-Agent" -> "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36")
.get
I've set exactly the same User-Agent as my local browser which succeed to access the image.
Here's the server response in debug: it returns a 400 status code.
Any idea why it happens?
The legacy "com.ning" package in your image shows that you're still running AHC1, that has reached end-of-life and I no longer maintain.
You're most likely hitting an old AHC bug related to URL encoding that's been long fixed (I checked that modern releases of AHC2 work fine with your URL).
Basically, it's time to consider upgrading Play/AHC versions.

Java HTTP Request on VPN only website

I am trying to build a website parser for one of our internal websites (accessible only from the company network - we get on the network through Cisco AnyConnect VPN).
I can access the site fine in any browser, but not using HTTP requests. Windows network and sharing center shows that I have two active networks:
The actual internet connection
The company network (without internet access).
Default HTTP client gets time out as I suppose it makes a request using the actual internet connection (and the website is not accessible to public), but using this code:
HttpParams params = httpClient.getParams();
params.setParameter(ConnRoutePNames.LOCAL_ADDRESS, InetAddress.getByName("10.x.x.x"));
I get the following error:
I/O exception caught when connectiong to /10.x.x.x -> {s} -> https://zzz.com:443: Network is unreachable: connect
Also, might be a stupid test but I have done a HTTP request to a "what is my ip" site and the IP is shown as my Wifi IP not the IP through VPN (which I get when I open browser and browse to a "what is my ip" website). Same thing (wrong IP) when I try this using a gui-less browser (Jaunt or HTMLUnit).
Please advise if any fixes for this.
ConnRoutePNames appears to be deprecated. See if the following works (I haven't tested):
HttpHost proxy = new HttpHost("10.x.x.x", 80);
HttpRoutePlanner routePlanner = new DefaultProxyRoutePlanner(proxy);
CloseableHttpClient httpclient = HttpClients.custom()
.setRoutePlanner(routePlanner)
.build();

Set user-agent in URLConnection

I am setting user-agent to test a iOS app from my Java client this way -
urlc.setRequestProperty("user-agent", "Mozilla/5.0 (Windows NT 5.1; rv:19.0) Gecko/20100101 Firefox/19.0");
However, in the JSON response I am getting an error that this app can only be tested on an iOS device (which is a custom response I have when the app is tested from a non-IOS device. So what is the correct way to set user-agent in Java?
A browser sends a special string, called a user agent, to websites to identify itself. The web server, or JavaScript in the downloaded webpage, detects the client’s identity and can modify its behavior accordingly. In the simplest case, the user agent string includes an application name—for example, Navigator as the application name and 6.0 as the version. Safari on the desktop and Safari on iOS have their own user agent strings, too.
https://developer.apple.com/library/ios/documentation/AppleApplications/Reference/SafariWebContent/OptimizingforSafarioniPhone/OptimizingforSafarioniPhone.html
UserAgent complete set
UserAgent complete set ssfari

UserAgentUtils always giving null browser version

I'm using the UserAgentUtils Java library to extract user agent details from the user agent string of browsers during a PDI transform, but no matter what I do I always get back a null version from the library after parsing the user agent string, even when I can clearly see the version in the string. For example:
String userAgentString = "Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US) AppleWebKit/533.3 (KHTML, like Gecko) capybara-webkit Safari/533.3"
UserAgent userAgent = new UserAgent(userAgentString)
userAgent.getBrowserVersion() //always comes back null
Two questions. What am I not doing right to get back the data from UserAgentUtils (it doesn't seem to be a bug because there's no history of issues related to this in their bug tracking system)?
Alternatively, is there another Java or JavaScript library I could use to extract the component information from user agent strings? Either one is okay, since I can equally easily use either in the PDI job where this code lives.
are you trying to set the http agent value for jetty http client requests?
i do this on my user defined java class:
import java.lang.System.*;
...
System.setProperty("http.agent", "my cool crawler, mycoolcrawler#example.com");
now all your http requests from kettle will send user agent header with this info

setRequestProperty (user-agent) not active until after "a while"

I'm writing an HTTP client, which needs to parse the response from a webserver, and I have run into (another) problem.
I found that for one page I was redirected to their mobile content portal:
example: www.example.com/m/public. This is not what I want.
When using a "normal" browser, this redirect did not take place.
After looking into the capture I made, I found that this could be because my user-agent is interpreted as that of a mobile handset browser (user agent was "Java/1.6.0_22").
So I changed the user agent, using this:
URL url = new URL(endpoint);
URLConnection conn = url.openConnection();
conn.setRequestProperty ( "User-agent", "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Trident/4.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.5.30729; InfoPath.1; .NET CLR 3.0.30618)");
To my surpise it still did not work, and I found that I was still sending user-agent "Java/1.6.0_22".
Then I looked a bit closer at my capture, and I saw that after a couple of GET requests (after the first GET I send GETs to the sources on the main page) the user-agent magically changed from java to "Mozilla...".
It seems my setRequestProperty does not become active until after a while...
Has anyone seen this? Any way to get around it?
Thanks!
This SO answer suggests setting the system property before-hand.
I had the same problem. I wrote a web crawler and web pages grabbed were mobile versions. Now I used both
System.setProperty("http.agent", "");
urlconn.setRequestProperty("User-Agent", "IE/9.0");
and it worked.

Categories

Resources