I am trying to make a request to a webpage that requires cookies. I'm using HTTPUrlConnection, but the response always comes back saying
<div class="body"><p>Your browser's cookie functionality is turned off. Please turn it on.
How can I make the request such that the queried server thinks I have cookies turned on. My code goes something like this.
private String readPage(String page) throws MalformedURLException {
try {
URL url = new URL(page);
HttpURLConnection uc = (HttpURLConnection) url.openConnection();
uc.connect();
InputStream in = uc.getInputStream();
int v;
while( (v = in.read()) != -1){
sb.append((char)v);
}
in.close();
uc.disconnect();
} catch (IOException e){
e.printStackTrace();
}
return sb.toString();
}
You need to add a CookieHandler to the system for it handle cookie. Before Java 6, there is no CookieHandler implementation in the JRE, you have to write your own. If you are on Java 6, you can do this,
CookieHandler.setDefault(new CookieManager());
URLConnection's cookie handling is really weak. It barely works. It doesn't handle all the cookie rules correctly. You should use Apache HttpClient if you are dealing with sensitive cookies like authentication.
I think server can't determine at the first request that a client does not support cookies. So, probably server sends redirects. Try to disable redirects:
uc.setInstanceFollowRedirects(false);
Then you will be able to get cookies from response and use them (if you need) on the next request.
uc.getHeaderFields()
// get cookie (set-cookie) here
URLConnection conn = url.openConnection();
conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 6.0; pl; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2");
conn.addRequestProperty("Referer", "http://xxxx");
conn.addRequestProperty("Cookie", "...");
If you're trying to scrape large volumes of data after a login, you may even be better off with a scripted web scraper like WebHarvest (http://web-harvest.sourceforge.net/) I've used it to great success in some of my own projects.
Related
What I need to do is send POST request to specific URL with two parameters and when the request is sent, I need to redirect user to that link so that he would be able to access functionality.
So far, what I have managed to do from various examples is this:
private void postRemoteAdvisoryLink() throws IOException {
URL obj = new URL(KdrmApplicationContext.getRemoteAdvisoryUrlPath());
HttpURLConnection con = (HttpURLConnection) obj.openConnection();
con.setConnectTimeout(60000);
con.setRequestMethod("POST");
con.setRequestProperty("User-Agent", "Mozilla/5.0");
con.setRequestProperty("Accept-Language", "en-US,en;q=0.5");
// For post only - start
con.setDoOutput(true);
OutputStream os = con.getOutputStream();
os.write(("?auth=ssor&TransportKey=" + ssorTransportKey).getBytes());
os.flush();
os.close();
int responseCode = con.getResponseCode();
}
The problem is that now I get connection time out when trying to execute OutputStream os = con.getOutputStream(); line. Also, I still have no idea how to redirect user when request is completed.
Any ideas?
Using the basic Java URL classes would require you to manually handle the details of HTTP protocol - it's better to use libraries like Apache Http Components, as they deal with the underlying protocols for you. Some examples including POST requests can be found on their website.
Given the original question, the Timeout is likely related to host not responding or your Java application being unable to connect to given URL (due to no proxy configuration for example).
If you want to redirect a request based on the answer, you need to check the response headers and http status - if the status is 302, then there should be a header called Location, which will contain the URL you should make another request to.
Before getting an OutputStream, also make sure to set the Content-Length header (and ideally the Content-Type header as well).
I am trying to retrieve some html texts from a list of google returned pages. most of them work fine, but for urls such as https://www.google.com/patents/US6034687 always gives 401 error see below
Server returned HTTP response code: 401 for URL: https://www.google.com/patents/US6034687
I am using java and I did look up on this error code, it seems authentication related, but this kind of URL can be accessed from any browsers without asking for login. So I am confused, how come only this kind of URL does not work for me.
here is my code for retrieving html
URL u=new URL(url);
StringBuilder html =new StringBuilder();
HttpURLConnection conn = (HttpURLConnection) u.openConnection();
conn.setRequestMethod("GET");
conn.setRequestProperty("Accept", "text/html");
BufferedReader br;
try {
br = new BufferedReader(new InputStreamReader((conn.getInputStream())));
String out="";
while ((out= br.readLine()) != null) {
// System.out.println(out);
html.append(out+"\n");
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Any idea?
thanks
Try sending a User-Agent header in the request. That 401 status is misleading. Some servers do not allow requests from non-browser clients.
conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 5.2; rv:21.0) Gecko/20100101 Firefox/21.0");
BTW, when you do openConnection() for an https scheme, the return value is HttpsURLConnection, which extends HttpURLConnection.
The request requires user authentication. The response MUST include a WWW-Authenticate header field containing a challenge applicable to the requested resource. The client MAY repeat the request with a suitable Authorization header field. If the request already included Authorization credentials, then the 401 response indicates that authorization has been refused for those credentials. If the 401 response contains the same challenge as the prior response, and the user agent has already attempted authentication at least once, then the user SHOULD be presented the entity that was given in the response, since that entity might include relevant diagnostic information. HTTP access authentication is explained in "HTTP Authentication: Basic and Digest Access Authentication
I use simple code to get html for http://www.ip-adress.com, but it shows error http code 403.
I try it in other website like google.com in program, it can work. i can also open www.ip-adress.com in browse, why i can't use it in java program.
public class urlconnection
{
public static void main(String[] args)
{
StringBuffer document = new StringBuffer();
try
{
URL url = new URL("http://www.ip-adress.com");
URLConnection conn = url.openConnection();
BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String line = null;
while ((line = reader.readLine()) != null)
document.append(line + " ");
reader.close();
}
catch (MalformedURLException e)
{
e.printStackTrace();
}
catch (IOException e)
{
e.printStackTrace();
}
System.out.println(document.toString());
}
}
java.io.IOException: Server returned HTTP response code: 403 for URL: http://www.ip-adress.com/
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at urlconnection.main(urlconnection.java:14)
This is the line you required
conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2");
refer this
The web-server can detect that you are not actually trying to access it via HTTP, so it rejects your request. There are ways to fake that to trick the server into thinking that you are a browser.
I suppose the site checks user agent header and blocks what it seems to be "a robot". You need to mimic normal browser. Check this solution Setting user agent of a java URLConnection or try to use commons http client AND set user agent.
I don't believe that this is fundamentally a Java problem. You're doing the right thing to make an HTTP connection, and the server is doing "the right thing" from its perspective by responding to your request with a 403 response.
Let's be clear about this - the response you're getting is due to whatever logic is being employed by the target webserver.
So if you were to ask "how can I modify my request so that http://www.ip-address.com returns a 200 response", then people may be able to come up with workarounds that keep that server happy. But this is a host-specific process; your Java code is arguably correct, though it should have better error handling because you can always get non-2xx responses.
Try to change Connection User-Agent to something like Browsers, most of times I use Mozilla/6.0 (Windows NT 6.2; WOW64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1
I am currently pen-testing a web application and came across an interesting phenomenon. During my testing sessions, I gathered URLs using a proxy. Now I wanted to test my URL list for anonymous access, so i wrote this little tool
public static void main(String[] args) {
try {
TrustAllCerts.disableCertChecks();
FileReader fr = new FileReader(new File("urls.txt"));
BufferedReader br = new BufferedReader(fr);
String urlStr = br.readLine();
while (urlStr != null) {
if (urlStr.trim().length() > 0) {
URL url = new URL(urlStr);
HttpsURLConnection urlc = (HttpsURLConnection) url.openConnection();
urlc.connect();
if (urlc.getResponseCode() == HttpURLConnection.HTTP_OK) {
System.out.println(urlStr);
} else {
System.out.println("["+urlc.getResponseCode()+"] "+urlStr);
}
urlc.disconnect();
}
urlStr = br.readLine();
}
br.close();
} catch (Exception e) {
e.printStackTrace();
}
}
It does basically nothing, but opening an URL connection on a given URL and test the HTTP response code (actually I implemented some more tests, if I'm getting redirected to a login page). However, the problem is, that this specific application (some custom MS SQL Server Reporting Services) is configured to use NTLM WWW authentication. If I try to access some of the URLs using Firefox, i get an 401 Unauthorized + login dlg. Internet Exploder performs NTLM auth in the background and grants access. It seems that the Java URLConnection (or URL) class does the same, because I am getting no 401. Is there a way to disable implicit NTLM authentication in Java? This is a bad pitfall for me.
I think the Java Network Documentation is the best resource. Setting the http.auth.preference="basic" should get you what you want. Assuming you don't need digest or something else. I'm not sure if you can go beyond that to disable NTLM.
Another thing to consider is other Java HTTP client implementations, like Apache's or Google's.
I'm not sure that this will help, but I've been stumped by the opposite.
I wanted NTLM auth to take place, so on my local machine I use a free app called CNTLM. It's a local proxy server that will forward (and NT auth) incoming requests. Good for apps that can't use NTLM proxies.
I'm sorry, I know this isn't answering your question, but maybe it proves helpful to someone out there! :)
I made an app. for Android which uses the C2DM service from Google. I
made a server simulator from some tutorials and it works fine. My
problem is, I tried to build a Java Servlet. From the Android device
it receives fine the message and saves the Registration ID, but when I
try to send a https POST request to the Google C2DM Server it always
gets a SocketTimeoutException : Timeout while fetching:
https://android.clients.google.com/c2dm/send.
I don't get why this is happening when the same works on the Android
device. Here is the code:
//The AuthToken from Google Client Login
String auth_key = TOKEN;
StringBuilder postDataBuilder = new StringBuilder();
//some parameters to pass, I've checked and it's correct, it's working
//with Fiddler
postDataBuilder.append(PARAM_REGISTRATION_ID).append("=").append(REGISTRATION_ID);
postDataBuilder.append("&").append(PARAM_COLLAPSE_KEY).append("=").append("0");
postDataBuilder.append("&").append("data.payload").append("=").append(URLEncoder.encode(message, UTF8));
byte[] postData = postDataBuilder.toString().getBytes(UTF8);
URL url = new URL("https://android.clients.google.com/c2dm/send");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setDoOutput(true);
conn.setUseCaches(false);
conn.setRequestMethod("POST");
conn.setRequestProperty("Content-Type","application/x-www-form-urlencoded;charset=UTF-8");
conn.setRequestProperty("Content-Length",Integer.toString(postData.length));
conn.setRequestProperty("Authorization", "GoogleLogin auth="+auth_key);
OutputStream out = conn.getOutputStream();
out.write(postData);
out.close();
int responseCode = conn.getResponseCode();
//here comes the error processing, but I can't reach it, because of
//the exception.
if (responseCode == 401 || responseCode == 403) {
//....
}
Thanks for your help :).
The first obvious thing to check is - if you have thought of this I apologise - are you behind a proxy server e.g. a company firewall? If so a timeout is exactly the symptom I'd expect with the above code. (This catches me out all the time!)
With the latter half of your code (from the HttpURLConnection declaration on), unmodified, I see a timeout; on my system (behind a company firewall), with two changes I get a 200 OK back:
addition of a proxy object passed to the HttpUrlConnection factory as follows:
Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress("...", 8080));
HttpURLConnection conn = (HttpURLConnection) url.openConnection(proxy);
accepting the C2DM server's certificate that wasn't trusted by my JVM. For test purposes I overrode the default hostname verifier and TrustManager as described in Trusting all certificates using HttpClient over HTTPS . For production you should look at a more secure solution.
Another thing I spotted; it doesn't seem to matter but http://code.google.com/android/c2dm/index.html#push says to post to https://android.apis.google.com/c2dm/send, not android.clients.google.com - just something to be aware of that might break in future.
I faced same problem and
I had tried :
URL url = new URL("http://android.apis.google.com/c2dm/send");
instead of :
URL url = new URL("https://android.apis.google.com/c2dm/send");
it worked for me.