Java function to detect valid webpage - java

I am trying to write a Java program that will load pages pointed to by valid links and report other links as broken. My problem is that the Java URL will download the appropriate page if the url is valid, and the search-engine results for the url if the url is invalid.
Is there a Java function that detects if the url resolves to a legitimate page . . . thanks very much,
Joel

HttpURLConnection#getResponseCode will give you an HTTP status code

You can get the HTTP response code for a URL like so:
public static int getResponseCode(URL url) throws IOException {
URLConnection conn = url.openConnection();
if (!(conn instanceof HttpURLConnection)) {
throw new IllegalArgumentException("not an HTTP url: " + url);
}
HttpURLConnection httpConn = (HttpURLConnection) conn;
return httpConn.getResponseCode();
}
Now the question is, what do you consider a "valid" webpage? For me, if a URL parses correctly and it's protocol is "http" (or https) and it's response code is in the 200 block or 302 (Found/Redirect) or 304 (Not modified), then it's valid:
public boolean isValidHttpResponseCode(int code) {
return ((code / 100) == 2) || (code == 302) || (code == 304);
}

Related

Check if url exists JAVA

I'm trying to check if the url entered by the user actually exists.
Below is what I have tried.
public static Boolean checkURLExists(String urlName)
{
Boolean urlCheck=false;
try{
URL url = new URL(urlName);
HttpURLConnection.setFollowRedirects(false);
HttpURLConnection huc = (HttpURLConnection) url.openConnection();
huc.setRequestMethod("GET");
int responseCode = huc.getResponseCode();
String responseMessage = huc.getResponseMessage();
char a=String.valueOf(Math.abs((long)huc.getResponseCode())).charAt(0);
if ((a == '2' || a == '3')&& (responseMessage.equalsIgnoreCase("ok")||responseMessage.equalsIgnoreCase("found")||responseMessage.equalsIgnoreCase("redirect"))) {
System.out.println("GOOD "+responseCode+" - "+a);
urlCheck=true;
} else {
System.out.println("BAD "+responseCode+" - "+a);
}
}catch(Exception e){
e.printStackTrace();
}
return urlCheck;
}
The issue with the above code is that it returns http://www.gmail.com or http://www.yahoo.co.in etc. as invalid URLs with response code 301 & response message "Moved permanently" but they actually redirects to other url, Is there any way to detect that the url when entered in browser will open a page?
Thank you.
Well the normal behavior of a web browser when it sees a 301 response is to follow the redirect. But you seem to have told your test code NOT to do that. If you want your code to behave (more) like a browser would, change this
HttpURLConnection.setFollowRedirects(false);
to this
HttpURLConnection.setFollowRedirects(true);

Java Facebook server to server get code URL parameter

I've been working on this for a couple of weeks now...
Basically what I am trying to do is login into Facebook (authenticate, accept the permissions, etc.), parse returned the "code" URL query param, and use that "code" param to get the FB user access token...
This is the FB_OAuthURL (for this example that is the name of the variable):
https://www.facebook.com/dialog/oauth?client_id=<APP_ID>&redirect_uri=http%3A%2F%2Flocalhost%2Fconnect%2Flogin_success.html&scope=public_profile%2Cpublish_actions%2Cuser_about_me%2Cuser_actions.books%2Cuser_actions.fitness%2Cuser_actions.music%2Cuser_actions.news%2Cuser_actions.video%2Cuser_birthday%2Cuser_education_history%2Cuser_events%2Cuser_games_activity%2Cuser_hometown%2Cuser_religion_politics%2Cuser_status%2Cuser_tagged_places%2Cuser_work_history%2Crsvp_event%2Cuser_relationships%2Cuser_relationship_details%2Cuser_location%2Cuser_likes%2Cuser_posts&state=<RANDOM_NUMBER>
The following is the method that I am using
public static String getFinalRedirectedUrl(String url) {
HttpURLConnection connection;
String finalUrl = url;
try {
do {
connection = (HttpURLConnection) new URL(finalUrl).openConnection();
connection.setInstanceFollowRedirects(false);
connection.setUseCaches(false);
connection.setRequestMethod("GET");
connection.connect();
if (connection.getResponseCode() >= 300 && responseCode < 400) {
String redirectedUrl = connection.getHeaderField("Location");
if (null == redirectedUrl)
break;
finalUrl = redirectedUrl;
System.out.println("redirected url: " + finalUrl);
} else
break;
} while (connection.getResponseCode() != HttpURLConnection.HTTP_OK);
connection.disconnect();
} catch (Exception e) {
e.printStackTrace();
}
return finalUrl;
}
However the results (let's call this FB_REDIRCTURL) of this is the following:
https://www.facebook.com/login.php?skip_api_login=1&api_key=<APP_ID>&signed_next=1&next=https%3A%2F%2Fwww.facebook.com%2Fv2.5%2Fdialog%2Foauth%3Fredirect_uri%3Dhttp%253A%252F%252Flocalhost%252Fconnect%252Flogin_success.html%26state%3D-<RANDOM_NUMBER>%26scope%3Dpublic_profile%252Cpublish_actions%252Cuser_about_me%252Cuser_actions.books%252Cuser_actions.fitness%252Cuser_actions.music%252Cuser_actions.news%252Cuser_actions.video%252Cuser_birthday%252Cuser_education_history%252Cuser_events%252Cuser_games_activity%252Cuser_hometown%252Cuser_religion_politics%252Cuser_status%252Cuser_tagged_places%252Cuser_work_history%252Crsvp_event%252Cuser_relationships%252Cuser_relationship_details%252Cuser_location%252Cuser_likes%252Cuser_posts%26client_id%3D<APP_ID>%26ret%3Dlogin&cancel_url=http%3A%2F%2Flocalhost%2Fconnect%2Flogin_success.html%3Ferror%3Daccess_denied%26error_code%3D200%26error_description%3DPermissions%2Berror%26error_reason%3Duser_denied%26state%3D-4486902649550591089%23_%3D_&display=page
My two questions are
if I copy/paste this URL - the browser redirects me and I get the "code" param - again that is with me manually copying & pasting -- how do I get the method to move forward and eventually retrieve the http response that I am looking for
The FB_REDIRCTURL says that there was an error within the URL params, however, as I stated it still works when I copy & paste the url into a browser...any ideas why that is?
Thanks everyone -- I really appreciate the help

Read response even though it's 404

Is there a way to read a response of a simple http-get made like below, if getResponseCode() == 404 and thus getInputStream() would throw an exception? I'd prefer to stay with java.net, if it's possible.
HttpURLConnection c = (HttpURLConnection)new URL("http://somesite.com").openConnection;
c.openInputStream();
Thing is, that I have (indeed) a site I want to read out with java that responds with 404, but displays in a browser, because it obviously caries a response anyway.
You want to use the getErrorStream() method if the getResponseCode() == 404.
HttpURLConnection c = (HttpURLConnection)new URL("http://somesite.com").openConnection();
InputStream in = null;
if (c.getResponseCode() >= 400) {
in = c.getErrorStream();
}
else {
in = c.getInputStream();
}

How to get redirect url on blackberry

I play a video url using streaming player on blackberry. If the url returns "200" status code it play successfully.
When i pass the below url, it returns "302" http status code.It won't play on streaming player.
http://belointr.rd.llnwd.net/KGW/ea398ac7b03a91c2ddf451f1fd7e3ef87f19da59_fl9.mp4?x-belo-vsect=kgw-basketball
When i check the statuscode for 302 its says redirect url.
When i pass the url on browser, it calls automatically below the redirect url.
http://belointr.vo.llnwd.net/kip0/_pxn=2+_pxI0=Ripod-h264+_pxL0=undefined+_pxM0=+_pxI1=A21907+_pxL1=begin+_pxM1=+_pxR1=13737+_pxK=20558/KGW/ea398ac7b03a91c2ddf451f1fd7e3ef87f19da59_fl9.mp4?x-belo-vsect=kgw-basketball
How can i get the redirect url programatically on blackberry.?
pls help me.
In the headers of the response, retrieve the value of the header 'Location', it contains the redirect url. This is standard in HTTP protocol
Edit: Real quick sample on how to get the location header (could be written a lot better and safer)
URL url = new URL("http://some.url");
int responseCode = -1;
while (responseCode != 200) {
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
responseCode = conn.getResponseCode();
if (responseCode > 299 && responseCode < 400) {
url = new URL(conn.getHeaderField("Location"));
}
}

Quickest way to get content type

I need to chech for the content type (if it's image, audio or video) of an url which has been inserted by the user. I have a code like this:
URL url = new URL(urlname);
URLConnection connection = url.openConnection();
connection.connect();
String contentType = connection.getContentType();
I'm getting the content type, but the problem is that it seems that it is necessary to download the whole file to check it's content type. So it last too much time when the file is quite big. I need to use it in a Google App Engine aplication so the requests are limited to 30 seconds.
Is there any other way to get the content type of a url without downloading the file (so it could be done quicker)?
Thanks to DaveHowes answer and googling around about how to get HEAD I got it in this way:
URL url = new URL(urlname);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("HEAD");
connection.connect();
String contentType = connection.getContentType();
If the "other" end supports it, could you use the HEAD HTTP method?
Be aware of redirects, I faced same problem with my remote content check.
Here is my fix:
/**
* Http HEAD Method to get URL content type
*
* #param urlString
* #return content type
* #throws IOException
*/
public static String getContentType(String urlString) throws IOException{
URL url = new URL(urlString);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("HEAD");
if (isRedirect(connection.getResponseCode())) {
String newUrl = connection.getHeaderField("Location"); // get redirect url from "location" header field
logger.warn("Original request URL: '{}' redirected to: '{}'", urlString, newUrl);
return getContentType(newUrl);
}
String contentType = connection.getContentType();
return contentType;
}
/**
* Check status code for redirects
*
* #param statusCode
* #return true if matched redirect group
*/
protected static boolean isRedirect(int statusCode) {
if (statusCode != HttpURLConnection.HTTP_OK) {
if (statusCode == HttpURLConnection.HTTP_MOVED_TEMP
|| statusCode == HttpURLConnection.HTTP_MOVED_PERM
|| statusCode == HttpURLConnection.HTTP_SEE_OTHER) {
return true;
}
}
return false;
}
You could also put some counter for maxRedirectCount to avoid infinite redirects loop - but this is not covered here. This is just a inspiration.
I faced a similar task where I needed to check the content type of the url, and the way how I managed it is with retrofit. First you have to define an endpoint to call it with the url you want to check:
#GET
suspend fun getContentType(#Url url: String): Response<Unit>
Then you call it like this to get the content type header:
api.getContentType(url).headers()["content-type"]

Categories

Resources