Checking the status of a web page [duplicate] - java

This question already has answers here:
How to use java.net.URLConnection to fire and handle HTTP requests
(12 answers)
Closed 9 years ago.
Need to make a program that takes a valid URL of a webpage like www.stackoverflow.com/questions and its IP address equivalent. The program will then find that webpage and return the status code of the page to us such as 200 OK and 404 NOT FOUND. If the webpage isn’t reachable, a message should be returned explaining the situation.
Here’s what I have done so far:
interface Result {
public boolean ok ();
public String message (); }
class Page {
public Result check ( String wholeURL ) throws Exception {
throw new Exception ( "Not sure about the rest”); } }
Also if I were to check a page like http://www.stackoverflow.com I’ll create an instance of Page and then do something like this:
Page page = new PageImplementation ();
Result result = page.check ( "http://www.stackoverflow.com:60" );
if ( result.ok () ) { ... }
else { ... }
The object that is returned is an instance of Result, and the “ok” method should return true when the status code is 200 OK but false otherwise. The method “msg” should return the status code as string.

Have a look at the HttpURLConnection class within the JDK or use Apache Http Components.
Basically you try to connect to the url and check the response header or wait for a timeout if the server isn't reachable at all.
With HttpURLConnection it might look like this:
URL url = new URL("http://www.stackoverflow.com");
HttpURLConnection connection = (HttpURLConnection)url.openConnection();
connection.connect();
int httpStatusCode = connection.getResponseCode(); //200, 404 etc.

You can use some api like commons http ,
import org.apache.commons.httpclient.*;
import org.apache.commons.httpclient.methods.*;
import org.apache.commons.httpclient.params.HttpMethodParams;
..........
public Result check ( String fullURL ) throws Exception {
HttpClient client = new HttpClient();
GetMethod method = new GetMethod(url);
int statusCode = client.executeMethod(method);
//Update your result object based on statuscode
}

Related

HtmlUnit, how to check if every asset was loaded?

I am using HtmlUnit for integration-testing.
In HtmlUnit, there is WebResponse::getStatusCode() for accessing HTTP_STATUS_CODE of currently loaded page. Let's say, my Page is "about" google.com, statusCode will be 200, usually.
I am wondering, if it is possible with HtmlUnit to check, if all required assets (*.css, *.js) were loaded successfully (200 <= statusCode < [300|400]).
In my use case, my tested page works fine - but a .css-file is missing, so the layout is broken. I would like to assert in my test-case, that each required assert can be loaded.
You can intercept all communication and store information about all assets.
Something like:
new WebConnectionWrapper(webClient) {
public WebResponse getResponse(WebRequest request) throws IOException {
WebResponse response = super.getResponse(request);
URL url = request.getUrl();
int status = response.getStatusCode();
// store the status
return response;
}
};

Java Get URL Header Alone, Body Not Required

Im using below code to get the cache-control value in header of given URL. I dont want to get the body of the URL. Below request takes 800ms to process. Is there any alteration can be done in below code? Im using Google App Engine for development. Please suggest. Thanks. I like not to add extra jar.
URL obj;
URLConnection conn = null;
String noTransform = "";
obj = new URL(url);
conn = obj.openConnection();
noTransform = conn.getHeaderField("cache-control");
if (noTransform !=null && (noTransform.contains("no-transform") || noTransform.contains("private") )){
news.setIsGoogleLiteURL("false");
return news;
}
else {
news.setIsGoogleLiteURL("false");
return news;
}
Instead of making a GET request, try making a HEAD request.
https://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.4

BufferedReader inconsistently hanging on my socket's inputstream

I am writing a Java HTTP server. I thought the entire server was working and it is using threading. However, I'm realizing that the piece of code that reads the request into a BufferedReader is not working consistently.
Here is the code that reads an incoming request:
private String receive(WebSocket webSocket) throws IOException {
int chr;
System.out.println("Receiving!");
StringBuffer buffer = new StringBuffer();
while ( (chr = webSocket.in().read() ) != -1) {
buffer.append((char) chr);
if ( !webSocket.in().ready())
break;
}
return buffer.toString();
}
My Websocket class just wraps the Socket and provides an in and an out. I did this so that I could mock out the socket and test my server.
The Websocket class looks like this:
package http.server.socket;
import java.io.*;
import java.net.Socket;
public class SystemSocket implements WebSocket {
private Socket theConnection;
private BufferedReader in;
private OutputStream out;
public SystemSocket(Socket theConnection) throws IOException {
this.theConnection = theConnection;
in = new BufferedReader(new InputStreamReader(theConnection.getInputStream()));
out = new BufferedOutputStream(theConnection.getOutputStream());
}
public BufferedReader in() throws IOException {
return in;
}
public OutputStream out() throws IOException {
return out;
}
public void close() throws IOException {
in.close();
out.close();
theConnection.close();
}
}
The problem is that with each url the user enters in a browser, two requests are made - one for the page requested and one for the favicon. Sometimes - it seems - the favicon request is not coming in and the thread hangs.
Here's some debugging information I have printing to the console when things go right:
Receiving!
Receiving!
REQUEST STRING = GET /color_picker.html HT
[20130821 20:29:23] REQUEST: http://localhost:5000/color_picker.html
[20130821 20:29:23] PAGE RENDERED
REQUEST STRING = GET /favicon.ico HTTP/1.1
[20130821 20:29:23] REQUEST: http://localhost:5000/favicon.ico
[20130821 20:29:23] PAGE RENDERED
The "Receiving" message is getting printed whenever the request is getting read. So, in this case, the "Receiving" message got printed twice, two requests came in and two things were rendered. But then, the same page (but at a different time) will do this (after about 10 seconds):
Receiving!
Receiving!
REQUEST STRING = GET /color_picker.html HTTP/1.1
[20130821 20:41:25] REQUEST: http://localhost:5000/color_picker.html
[20130821 20:41:25] PAGE RENDERED
REQUEST STRING =
Exception in thread "ServerThread" java.lang.ArrayIndexOutOfBoundsException: 1
at http.request.Parser.setRequestLineData(Parser.java:42)
at http.request.Parser.setRequestHash(Parser.java:27)
at http.request.Parser.parse(Parser.java:13)
at http.request.Request.get(Request.java:18)
at http.server.ServerThread.run(ServerThread.java:39)
All the subsequent errors are because the request string is null. But I can't figure out why the Request string is null. I can't even figure out how to debug.
Can anyone help??
Also important to note that if the second request string doesn't come in right away, the user can request a new url and it will cause the second hung process to complete (so then the fourth request url will be what hangs). So, it's only when the user stops requesting things, on the last request after about 10 seconds, I will get the error. Sometimes I can request 20 different pages and it's only after I stop requesting pages and wait a few seconds, that I will see an error. I think this is what is happening??
UPDATE:
Per the request, here is the setRequestLineData() method:
private void setRequestLineData() {
requestHash = new HashMap<String, String>();
if (requestLineParts.length == 3) {
requestHash.put("httpMethod", requestLineParts[0]);
requestHash.put("url", requestLineParts[1]); //line 42
requestHash.put("httpProtocol", requestLineParts[2]);
}
else {
requestHash.put("httpMethod", requestLineParts[0]);
requestHash.put("url", requestLineParts[1]);
requestHash.put("queryString", requestLineParts[2]);
requestHash.put("httpProtocol", requestLineParts[3]);
}
}
UPDATE:
I think I figured out more about what is going on here with my mentor's help. His thought is that once a request is received, the browser starts another request right away to reduce load time for the next request. This sound plausible to me since I can load page after page after page, but it's only about 10 seconds after the last page is requested that I get an error. Currently, I'm handling this with a custom exception, but am working on a better solution. Thanks for all the help guys!
ready() isn't a valid test for end of message. It only tells you whether there is data available to be read without blocking. TCP isn't a message-oriented protocol, it is a byte-stream protocol. If you want messages you must implement them yourself, e.g. as lines, length-value tuples, type-length-value tuples, serialized objects, XML documents, ...
There are few if any correct uses of ready() (or available()), and this isn't one of them.

Reading and printing HTML from website hangs up

I've been working on some Java code in which a string is converted into a URL and then used to download and output its corresponding URL. Unfortunately, when I run the program, it just hangs up. Does anyone have any suggestsion?
Note: I've used import java.io.* and import java.net.*
public static boolean htmlOutput(String testURL) throws Exception {
URL myPage2 = new URL(testURL); //converting String to URL
System.out.println(myPage2);
BufferedReader webInput2 = new BufferedReader(
new InputStreamReader(myPage2.openStream()));
String individualLine=null;
String completeInput=null;
while ((individualLine = webInput2.readLine()) != null) {
//System.out.println(inputLine);
System.out.println(individualLine);
completeInput=completeInput+individualLine;
}//end while
webInput2.close();
return true;
}//end htmlOutput()
[Though this answer helped the OP it is wrong. HttpURLConnection does follow redirects so this could not be the OP 's problem. I will remove it as soon as the OP removes the accepted mark.]
My guess is that you don't get anything back in the response stream because the page you are trying to connect sends you a redirect response (i.e. 302).
Try to verify that by reading the response code and iterate over the response headers. There should be a header named Location with a new url that you need to follow
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
int code = connection.getResponseCode();
Map<String, List<String>> map = conn.getHeaderFields();
// iterate over the map and find new url
If you are having trouble getting the above snippet to work take a look at a working example
You could do yourself a favor and use a third party http client like Apache Http client that can handle redirects otherwise you should do this manually.

Get URL content with Basic Authentication with Java and async-http-client

I am writing a Java lib and need to perform a request to a URL - currently using async-http-client from ning - and fetch its content. So I have a get method that returns a String
of the content of the fetched document. However, to be able to get it, I must perform a HTTP basic authentication and I'm not succeeding at this in my Java code:
public String get(String token) throws IOException {
String fetchURL = "https://www.eventick.com.br/api/v1/events/492";
try {
String encoded = URLEncoder.encode(token + ":", "UTF-8");
return this.asyncClient.prepareGet(fetchURL)
.addHeader("Authorization", "Basic " + encoded).execute().get().getResponseBody();
}
}
The code returns no error, it just doesn't fetch the URL because the authentication header is not being properly set, somehow.
With curl -u option I can easily get what I want:
curl https://www.eventick.com.br/api/v1/events/492 -u 'xxxxxxxxxxxxxxx:'
Returns:
{"events":[{"id":492,"title":"Festa da Bagaceira","venue":"Mangueirão de Paulista",
"slug":"bagaceira-fest", "start_at":"2012-07-29T16:00:00-03:00",
"links":{"tickets":[{"id":738,"name":"Normal"}]}}]}
How can this be done in Java? With the async-http-client lib? Or if you know how to do it using another way..
Any help is welcome!
You're close. You need to base 64 encode rather than URL encode. That is, you need
String encoded = Base64.getEncoder().encodeToString((user + ':' + password).getBytes(StandardCharsets.UTF_8));
rather than
String encoded = URLEncoder.encode(token + ":", "UTF-8");
(Note that for the benefit of others, since I'm answering 2 years later, in my answer I'm using the more standard "user:password" whereas your question has "token:". If "token:" is what you needed, then stick with that. But maybe that was part of the problem, too?)
Here is a short, self-contained, correct example
package so17380731;
import com.ning.http.client.AsyncHttpClient;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import javax.ws.rs.core.HttpHeaders;
public class BasicAuth {
public static void main(String... args) throws Exception {
try(AsyncHttpClient asyncClient = new AsyncHttpClient()) {
final String user = "StackOverflow";
final String password = "17380731";
final String fetchURL = "https://www.eventick.com.br/api/v1/events/492";
final String encoded = Base64.getEncoder().encodeToString((user + ':' + password).getBytes(StandardCharsets.UTF_8));
final String body = asyncClient
.prepareGet(fetchURL)
.addHeader(HttpHeaders.AUTHORIZATION, "Basic " + encoded)
.execute()
.get()
.getResponseBody(StandardCharsets.UTF_8.name());
System.out.println(body);
}
}
}
The documentation is very sketchy, but I think that you need to use a RequestBuilder following the pattern shown in the Request javadoc:
Request r = new RequestBuilder().setUrl("url")
.setRealm((new Realm.RealmBuilder()).setPrincipal(user)
.setPassword(admin)
.setRealmName("MyRealm")
.setScheme(Realm.AuthScheme.DIGEST).build());
r.execute();
(Obviously, this example is not Basic Auth, but there are clues as to how you would do it.)
FWIW, one problem with your current code is that a Basic Auth header uses base64 encoding not URL encoding; see the RFC2617 for details.
basically, do it like this:
BoundRequestBuilder request = asyncHttpClient
.preparePost(getUrl())
.setHeader("Accept", "application/json")
.setHeader("Content-Type", "application/json")
.setRealm(org.asynchttpclient.Dsl.basicAuthRealm(getUser(), getPassword()))
// ^^^^^^^^^^^-- this is the important part
.setBody(json);
Test can be found here:
https://github.com/AsyncHttpClient/async-http-client/blob/master/client/src/test/java/org/asynchttpclient/BasicAuthTest.java
This is also another way of adding Basic Authorization,
you can use any of two the classes for your use AsyncHttpClient,HttpClient,in this case i will use AsyncHttpClient
AsyncHttpClient client=new AsyncHttpClient();
Request request = client.prepareGet("https://www.eventick.com.br/api/v1/events/492").
setHeader("Content-Type","application/json")
.setHeader("Authorization","Basic b2pAbml1LXR2LmNvbTpnMGFRNzVDUnhzQ0ZleFQ=")
.setBody(jsonObjectRepresentation.toString()).build();
after adding header part
ListenableFuture<Response> r = null;
//ListenableFuture<Integer> f= null;
try{
r = client.executeRequest(request);
System.out.println(r.get().getResponseBody());
}catch(IOException e){
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
}
client.close();
it may be useful for you

Categories

Resources