capture redirect destination JAVA [duplicate] - java

I am accessing web pages through java as follows:
URLConnection con = url.openConnection();
But in some cases, a url redirects to another url. So I want to know the url to which the previous url redirected.
Below are the header fields that I got as a response:
null-->[HTTP/1.1 200 OK]
Cache-control-->[public,max-age=3600]
last-modified-->[Sat, 17 Apr 2010 13:45:35 GMT]
Transfer-Encoding-->[chunked]
Date-->[Sat, 17 Apr 2010 13:45:35 GMT]
Vary-->[Accept-Encoding]
Expires-->[Sat, 17 Apr 2010 14:45:35 GMT]
Set-Cookie-->[cl_def_hp=copenhagen; domain=.craigslist.org; path=/; expires=Sun, 17 Apr 2011 13:45:35 GMT, cl_def_lang=en; domain=.craigslist.org; path=/; expires=Sun, 17 Apr 2011 13:45:35 GMT]
Connection-->[close]
Content-Type-->[text/html; charset=iso-8859-1;]
Server-->[Apache]
So at present, I am constructing the redirected url from the value of the Set-Cookie header field. In the above case, the redirected url is copenhagen.craigslist.org
Is there any standard way through which I can determine which url the particular url is going to redirect.
I know that when a url redirects to other url, the server sends an intermediate response containing a Location header field that tells the redirected url but I am not receiving that intermediate response through the url.openConnection(); method.

Simply call getUrl() on URLConnection instance after calling getInputStream():
URLConnection con = new URL( url ).openConnection();
System.out.println( "orignal url: " + con.getURL() );
con.connect();
System.out.println( "connected url: " + con.getURL() );
InputStream is = con.getInputStream();
System.out.println( "redirected url: " + con.getURL() );
is.close();
If you need to know whether the redirection happened before actually getting it's contents, here is the sample code:
HttpURLConnection con = (HttpURLConnection)(new URL( url ).openConnection());
con.setInstanceFollowRedirects( false );
con.connect();
int responseCode = con.getResponseCode();
System.out.println( responseCode );
String location = con.getHeaderField( "Location" );
System.out.println( location );

You need to cast the URLConnection to HttpURLConnection and instruct it to not follow the redirects by setting HttpURLConnection#setInstanceFollowRedirects() to false. You can also set it globally by HttpURLConnection#setFollowRedirects().
You only need to handle redirects yourself then. Check the response code by HttpURLConnection#getResponseCode(), grab the Location header by URLConnection#getHeaderField() and then fire a new HTTP request on it.

public static URL getFinalURL(URL url) {
try {
HttpURLConnection con = (HttpURLConnection) url.openConnection();
con.setInstanceFollowRedirects(false);
con.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36");
con.addRequestProperty("Accept-Language", "en-US,en;q=0.8");
con.addRequestProperty("Referer", "https://www.google.com/");
con.connect();
//con.getInputStream();
int resCode = con.getResponseCode();
if (resCode == HttpURLConnection.HTTP_SEE_OTHER
|| resCode == HttpURLConnection.HTTP_MOVED_PERM
|| resCode == HttpURLConnection.HTTP_MOVED_TEMP) {
String Location = con.getHeaderField("Location");
if (Location.startsWith("/")) {
Location = url.getProtocol() + "://" + url.getHost() + Location;
}
return getFinalURL(new URL(Location));
}
} catch (Exception e) {
System.out.println(e.getMessage());
}
return url;
}
To get "User-Agent" and "Referer" by yourself, just go to developer mode of one of your installed browser (E.g. press F12 on Google Chrome). Then go to tab 'Network' and then click on one of the requests. You should see it's details. Just press 'Headers' sub tab (the image below)

Have a look at the HttpURLConnection class API documentation, especially setInstanceFollowRedirects().

I'd actually suggest using a solid open-source library as an http client. If you take a look at http client by ASF you'll find life a lot easier. It is an easy-to-use,scalable and robust client for http.

#balusC I did as you wrote . In my case , I've added cookie information to be able to reuse the session .
// get the cookie if need
String cookies = conn.getHeaderField("Set-Cookie");
// open the new connnection again
conn = (HttpURLConnection) new URL(newUrl).openConnection();
conn.setRequestProperty("Cookie", cookies);

Related

curl response is different from the responce of java.net.URL

curl -v https://whatwg.org/html
header is:
HTTP/1.1 301 Moved Permanently
Location: https://html.spec.whatwg.org/multipage
however..
String link = "https://whatwg.org/html";
URL url = new URL(link);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
int status = conn.getResponseCode();
System.out.println("Response Code ... " + status);
Response Code ... 200
The first and second results are different, what I am doing wrong and how should I receive 301 ?

Java - Get filename from "http://www.example.com/something.php?id=1111"

After googling, I found that file name is in Content-Disposition header field but this link does not has this header field. Here is the link
http://www.songspk.link/link/song.php?songid=5558
In web browser, above link redirects to
http://sound6.mp3slash.net/indian/mumbai_salsa/mumbaisalsa04%28www.songs.pk%29.mp3
The code I used :
URL url = new URL("http://www.songspk.link/link/song.php?songid=5558");
HttpURLConnection conn = null;
try {
conn = (HttpURLConnection) url.openConnection();
conn.setRequestProperty("User-Agent",
"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0");
conn.setRequestMethod("GET");
conn.setInstanceFollowRedirects(true);
Map<String, List<String>> map = conn.getHeaderFields();
Set<String> keys = map.keySet();
for (String s : keys) {
System.out.println(s);
System.out.println("--->" + map.get(s));
}
} catch (Exception e) {
e.printStackTrace();
} finally {
conn.disconnect();
}
I checked all header fields and here is list
null
--->[HTTP/1.1 200 OK]
ETag
--->["98f85f68c5ddcf1:0"]
Date
--->[Wed, 23 Mar 2016 10:01:15 GMT]
Content-Length
--->[5777792]
Last-Modified
--->[Wed, 01 Oct 2014 22:16:54 GMT]
Accept-Ranges
--->[bytes]
Content-Type
--->[audio/mpeg]
X-Powered-By-Plesk
--->[PleskWin]
X-Powered-By
--->[ASP.NET]
Server
--->[Microsoft-IIS/7.5]
I need the original filename. I have no problem in using external library if it can solve my problem.
Just use getURL() method of the connection, it will return already redirected url:
System.out.println(conn.getURL());
Output:
http://sound6.mp3slash.net/indian/mumbai_salsa/mumbaisalsa04(www.songs.pk).mp3

Strange behaviour in Get request

NOTE: This contains fixed code.
The following get request works:
curl https://9d3d9934609d1a7d79865231be1ecb23:9432fb76a34a0d46d64a2f4cf81bebd6#smartprice-2.myshopify.com/admin/orders.json
But the following code in java that I though did the same returns a 401.
final String url = "https://9d3d9934609d1a7d79865231be1ecb23:9432fb76a34a0d46d64a2f4cf81bebd6#smartprice-2.myshopify.com/admin/orders.json";
final HttpURLConnection con = (HttpURLConnection) new URL(url).openConnection();
con.setRequestMethod("GET");
con.setRequestProperty("User-Agent", USER_AGENT);
con.setRequestProperty("Content-Type", "application/json");
final String encoded = Base64.encodeBase64String((api+":"+pass).getBytes());
con.setRequestProperty("Authorization", "Basic "+encoded);
System.out.println("\nSending 'GET' request to URL : " + url);
int responseCode = con.getResponseCode();
System.out.println("Response Code : " + responseCode);
What am I missing here?
Are those not identical?
401 means unauthorized. Nothing surprising. The thing is that curl is able to resolve username:password used in your URL (part before '#' sign) and append it automatically as Authorization header in your request. But Java API is not doing this so you will have to do it on your own. The best way to investigate is to run curl with -v option. In it, you will see something like:
* Server auth using Basic with user '9d3d9934609d1a7d79865231be1ecb23'
> GET /admin/orders.json HTTP/1.1
> Host: smartprice-2.myshopify.com
> Authorization: Basic OWQzZDk5MzQ2MDlkMWE3ZDc5ODY1MjMxYmUxZWNiMjM6OTQzMmZiNzZhMzRhMGQ0NmQ2NGEyZjRjZjgxYmViZDY=
> User-Agent: curl/7.44.0
> Accept: */*
So you can notice that curl automatically appends HTTP Basic Authorization header to your request. So the correct Java code would be:
final String url = "https://smartprice-2.myshopify.com/admin/orders.json";
final HttpsURLConnection con = (HttpsURLConnection) new URL(url).openConnection();
con.setRequestProperty("Authorization", "Basic OWQzZDk5MzQ2MDlkMWE3ZDc5ODY1MjMxYmUxZWNiMjM6OTQzMmZiNzZhMzRhMGQ0NmQ2NGEyZjRjZjgxYmViZDY=");
con.setRequestMethod("GET");
System.out.println("Response Code : " + con.getResponseCode());
You can notice, that there is no reason to use credentials in URL and use only Authorization header (request property). By the way if you decode Base64: OWQzZDk5MzQ2MDlkMWE3ZDc5ODY1MjMxYmUxZWNiMjM6OTQzMmZiNzZhMzRhMGQ0NmQ2NGEyZjRjZjgxYmViZDY=, you will get exactly the part of URL before '#' which is: 9d3d9934609d1a7d79865231be1ecb23:9432fb76a34a0d46d64a2f4cf81bebd6
If you want automatic way how to resolve your Authorization header, you can use
final String credentials = DatatypeConverter.printBase64Binary("username:password".getBytes());
con.setRequestProperty("Authorization", "Basic " + credentials);
401 error stands for the Unauthorized access.
You need to either use Authenticator:
Authenticator.setDefault (new Authenticator() {
protected PasswordAuthentication getPasswordAuthentication() {
return new PasswordAuthentication ("username", "password".toCharArray());
}
});
or set a property:
String basicAuth = "Basic " + new String(new Base64().encode(userpass.getBytes()));
con.setRequestProperty ("Authorization", basicAuth);

How can I browse web site after login using https/http protocol(in Java)

I'm trying to login web site using Java and I succeeded. Below is the code I used.
String query = "myquery";
URL url = new URL(loginUrl);
HttpsURLConnection con = (HttpsURLConnection) url.openConnection();
con.setRequestMethod("POST");
con.setRequestProperty("Content-length", String.valueOf(query.length()));
con.setRequestProperty("Content-Type","application/x-www-form-urlencoded");
con.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 5.0;Windows98;DigExt)");
con.setDoOutput(true);
con.setDoInput(true);
DataOutputStream output = new DataOutputStream(con.getOutputStream());
output.writeBytes(query);
output.close();
DataInputStream input = new DataInputStream( con.getInputStream() );
for( int c = input.read(); c != -1; c = input.read() ) {
System.out.print( (char)c );
// this page returns JavaScript code
}
After this, I want to access another web page in same domain, so I tried below code.
URL url = new URL(anotherUrl);
HttpURLConnection con = (HttpURLConnection) url.openConnection();
... similar to above code ...
But this page asks me to login again. I think connection has been disconnected in the process of changing URL. (Onlt login page uses HTTPS protocol and other pages use HTTP protocol)
How can I fix this?
Someone please help
Keep in mind that HTTP is completely stateless. The idea of "logging in" to a site translates to (usually) setting cookies from an HTTP perspective. Those cookies are simply HTTP headers and they are sent with each subsequent request by your browser. So for you to maintain the logged in state its up to you get the cookies from the response headers and send them along with future requests.
Here is how:
Retrieving cookies from a response:
Open a java.net.URLConnection to the server:
URL myUrl = new URL("http://www.hccp.org/cookieTest.jsp");
URLConnection urlConn = myUrl.openConnection();
urlConn.connect();
Loop through response headers looking for cookies:
Since a server may set multiple cookies in a single request, we will need to loop through the response headers, looking for all headers named "Set-Cookie".
String headerName=null;
for (int i=1; (headerName = uc.getHeaderFieldKey(i))!=null; i++) {
if (headerName.equals("Set-Cookie")) {
String cookie = urlConn.getHeaderField(i);
...
Extract cookie name and value from cookie string:
The string returned by the getHeaderField(int index) method is a series of name=value separated by semi-colons (;). The first name/value pairing is actual data string we are interested in (i.e. "sessionId=0949eeee22222rtg" or "userId=igbrown"), the subsequent name/value pairings are meta-information that we would use to manage the storage of the cookie (when it expires, etc.).
cookie = cookie.substring(0, cookie.indexOf(";"));
String cookieName = cookie.substring(0, cookie.indexOf("="));
String cookieValue = cookie.substring(cookie.indexOf("=") + 1, cookie.length());
This is basically it. We now have the cookie name (cookieName) and the cookie value (cookieValue).
Setting a cookie value in a request:
Values must be set prior to calling the connect method:
URL myUrl = new URL("http://www.hccp.org/cookieTest.jsp");
URLConnection urlConn = myUrl.openConnection();
Create a cookie string:
String myCookie = "userId=igbrown";
Add the cookie to a request:
Using the
setRequestProperty(String name, String value);
method, we will add a property named "Cookie", passing the cookie string created in the previous step as the property value.
urlConn.setRequestProperty("Cookie", myCookie);
Send the cookie to the server:
To send the cookie, simply call connect() on the URLConnection for which we have added the cookie property:
urlConn.connect()

java HttpsURLConnection

I tried to make this curl request executable from Java:
curl -H 'Accept: application/vnd.twitchtv.v2+json' \
-d "channel[status]=testing+some+stuff" \
-X PUT https://api.twitch.tv/kraken/channels/testacc222?oauth_token=6e7b9cyfi8zk1gr8g06eecebnitlcvb
My solution looks like this:
public static void main(String args[]) throws IOException {
String uri = "https://api.twitch.tv/kraken/channels/testacc222?oauth_token=6e7b9cyfi8zk1gr8g06eecebnitlcvb";
URL url = new URL(uri);
HttpsURLConnection conn = (HttpsURLConnection) url.openConnection();
conn.setRequestMethod("PUT");
conn.setDoOutput(true);
conn.setRequestProperty("Accept", "application/vnd.twitchtv.v2+json");
String data = "channel[status]=testing";
OutputStreamWriter out = new OutputStreamWriter(conn.getOutputStream());
out.write(data);
out.flush();
for (Entry<String, List<String>> header : conn.getHeaderFields().entrySet()) {
System.out.println(header.getKey() + "=" + header.getValue());
}
}
I don't see any problem yet all it returns is:
Status=[400 Bad Request]
null=[HTTP/1.1 400 Bad Request]
Server=[nginx]
X-Request-Id=[ccc7a9a4a327b18ea4bf496f1f314fb8]
X-Runtime=[0.032328]
Connection=[keep-alive]
X-MH-Cache=[appcache1; M]
Date=[Sun, 06 Jul 2014 14:07:49 GMT]
Via=[1.1 varnish]
Accept-Ranges=[bytes]
X-Varnish=[2778442693]
X-UA-Compatible=[IE=Edge,chrome=1]
Cache-Control=[max-age=0, private, must-revalidate]
Vary=[Accept-Encoding]
Content-Length=[83]
Age=[0]
X-API-Version=[2]
Content-Type=[application/json; charset=utf-8]
I'm trying to figure this out for over a week now and I just don't see the mistake. Any help whatsoever would be greatly appreciated.
Try examining the response body, as it probably contains details about the rejection. Since the Content-Type specifies utf-8, you can create an InputStreamReader using that:
try (Reader response =
new InputStreamReader(conn.getErrorStream(), StandardCharsets.UTF_8)) {
int c;
while ((c = response.read()) >= 0) {
System.out.print((char) c);
}
}
Update: The response body states that the 'channel' parameter isn't present. This is because curl automatically encodes the POST data as application/x-www-form-urlencoded, but your code does not. You'll need to use URLEncoder on your data and also set the request's Content-Type:
HttpsURLConnection conn = (HttpsURLConnection) url.openConnection();
conn.setRequestMethod("PUT");
conn.setDoOutput(true);
conn.setRequestProperty("Accept", "application/vnd.twitchtv.v2+json");
conn.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
String data = "channel[status]=testing";
data = URLEncoder.encode(data, "UTF-8");

Categories

Resources