Good evening to all of you
I want to fetch a webpage using the socket class in java and i have done this as
import java.net.*;
import java.io.*;
class htmlPageFetch{
public static void main(String[] args){
try{
Socket s = new Socket("127.0.0.1", 80);
DataInputStream dIn = new DataInputStream(s.getInputStream());
DataOutputStream dOut = new DataOutputStream(s.getOutputStream());
dOut.write("GET /index.php HTTP/1.0\n\n".getBytes());
boolean more_data = true;
String str;
while(more_data){
str = dIn.readLine();
if(str==null)
more_data = false;
System.out.println(str);
}
}catch(IOException e){
}
}
}
But it is just giving the null's.
Output
HTTP/1.1 302 Found
Date: Wed, 01 Dec 2010 13:49:02 GMT
Server: Apache/2.2.11 (Unix) DAV/2 mod_ssl/2.2.11 OpenSSL/0.9.8k PHP/5.2.9 mod_apreq2-20051231/2.6.0 mod_perl/2.0.4 Perl/v5.10.0
X-Powered-By: PHP/5.2.9
Location: http://localhost/xampp/
Content-Length: 0
Content-Type: text/html
null
I'm not sure if this is causing your problem, but HTTP expects carriage return and line feed for a newline:
dOut.write("GET /index.php HTTP/1.0\r\n\r\n".getBytes());
Also, it wouldn't hurt to flush and close your DataOutputStream:
dOut.flush();
dOut.close();
If you plan on doing anything more with this code than just connecting to simple test cases, I'd recommending using HttpURLConnection for this instead of implenting HTTP in a socket yourself. Otherwise, the result will contain more than just the web page. It will also contain the HTTP response including status codes and headers. Your code would need to parse that.
Update:
Looking at the response you added, that 302 response along with the Location: header indicate that the page you are looking for moved to http://localhost/xampp/ (see HTTP 302) and there is no longer any content at the original URL. This is something that is can be set to be handled automatically by HttpURLConnection or another library like Apache HttpClient. You will need to parse the status code, parse the headers, open a new socket to the response Location and get the page. Depending upon the exact requirements of your assignment, you will probably want to familiarize yourself with the HTTP 1.0 Specification, and the HTTP 1.1 Specification as well.
I think the code is working, except maybe that you don't see the output because it's swamped by all the nulls you print. You should stop the while after the first null.
More generally, DataInputStream and DataOutputStream are not the right classes for this job. Try this code.
public static void main(String[] args) throws IOException {
Socket s = new Socket("127.0.0.1", 80);
BufferedReader dIn = new BufferedReader(new InputStreamReader(s.getInputStream()));
PrintStream dOut = new PrintStream(s.getOutputStream());
dOut.println("GET /index.php HTTP/1.0");
dOut.println();
String str = null;
do {
str = dIn.readLine();
System.out.println(str);
} while (str != null);
}
Why are you using socket directly to perform HTTP connection? This is fine exercise but it requires deep knowledge of internals of HTTP protocol. Why not just to use classes URL, and URLConnection?
BufferedReader dIn = new BufferedReader(new URL("http://127.0.0.1:80").openConnection().getInputStream());
do {
str = dIn.readLine();
System.out.println(str);
} while (str != null);
}
Related
Hello I'm making an HTTP client. I'm trying to fetch google.com's html code. I have a problem the the BufferedReader.readLine() function is blocking endlessly because the remote server apparently doesn't send a blank line? Or could it be that my request is wrong?
Appreciate any help!
public static void main(String[] args) {
String uri = "www.google.com";
int port = 80;
Socket socket = new Socket(uri, port);
PrintWriter toServer = new PrintWriter(socket.getOutputStream(), true);
InputStream inputStream = socket.getInputStream();
get(uri, port, language, socket, toServer, inputStream);
}
public static void get(String uri, int port, String language, Socket socket, PrintWriter toServer, InputStream inputStream) {
try {
toServer.println("GET / HTTP/1.1");
toServer.println("Host: " + uri + ":" + port);
toServer.println();
// Parse header
StringBuilder stringBuilder = new StringBuilder();
BufferedReader fromServer = new BufferedReader(new InputStreamReader(inputStream));
String line;
while ((line = fromServer.readLine()) != null) {
stringBuilder.append(line);
}
System.out.println("done");
} catch (IOException e) {
e.printStackTrace();
}
}
You are sending a HTTP/1.1 request which by default enables HTTP keep-alive. This means that the server might keep the TCP connection open after the response was sent in order to accept more requests from the client. Your code instead assumes that the server will close the connection after the response was finished by explicitly expecting readline to return null. But since the server will not close the connection (or only after some long timeout) the readline will just block.
To fix this either use HTTP/1.0 (which has keep-alive off by default) instead of HTTP/1.1 or explicitly tell the server that no more requests will be send by adding a Connection: close header.
Please note that in general HTTP is way more complex than you might think if you've just seen a few examples. The problem you face in your question is only a glimpse into more problems which you will face when continuing this path. If you really want to implement your own HTTP handling instead of using established libraries please study the actual standard instead of just assuming a specific behavior.
This is a a Java method that tries to crawl a designated web page. I am using writeUTF and readUTF for socket communications to a server.
static void get_html(String host, String page, int port) throws IOException {
Socket sock = new Socket(host, port);
String msg = MessageFormat.format("GET {0} HTTP/1.1\r\nHost: {1}\r\n\r\n", page, host);
DataOutputStream outToServer = new DataOutputStream(sock.getOutputStream());
DataInputStream inFromServer = new DataInputStream(sock.getInputStream());
InputStream stream = new ByteArrayInputStream(msg.getBytes(StandardCharsets.UTF_8));
BufferedReader buf = new BufferedReader(new InputStreamReader(stream));
String outMsg;
while ((outMsg = buf.readLine()) != null) {
System.out.println("Sending message: " + outMsg);
outToServer.writeUTF(outMsg);
String inMsg;
try {
inMsg = inFromServer.readUTF();
} catch (EOFException eof) {
break;
}
System.out.println(inMsg);
}
sock.close();
}
The reason I am writing it this way was to mimic the c code, where you have a while loop of send() making all deliveries from a buffer, and another while loop of recv() from a buffer untill it hits 'null'. When execute my code, it just hangs there, I suspect that is due to a call of readUTF before I finished sending all my messages. If this is the case, is there any way to fix it?
You can't do this. HTTP is defined as text lines. writeUTF() does not write text, it writes a special format starting with a 16-bit binary length word. Similarly the HTTP server won't reply with that format into your readUTF() call. See the Javadoc.
You have to use binary streams and the write() method, with \r\n as the line terminator. Depending on the output format you may or may not be able to use readLine(). Best not, then you don't have to write two pieces of code: use binary streams again.
In fact you should throw it all away and use HttpURLConnection. Implementing HTTP is not as simple as may hastily be supposed.
I have an assignment for school that involves writing a simple web crawler that crawls Wikipedia. The assignment stipulates that I can't use any external libraries so I've been playing around with the java.net.URL class. Based on the official tutorial and some code given by my professor I have:
public static void main(String[] args) {
System.setProperty("sun.net.client.defaultConnectTimeout", "500");
System.setProperty("sun.net.client.defaultReadTimeout", "1000");
try {
URL url = new URL(BASE_URL + "/wiki/Physics");
InputStream is = url.openStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String inputLine;
int lineNum = 0;
while ((inputLine = br.readLine()) != null && lineNum < 10) {
System.out.println(inputLine);
lineNum++;
}
is.close();
}
catch (MalformedURLException e) {
System.out.println(e.getMessage());
}
catch (IOException e) {
System.out.println(e.getMessage());
}
}
In addition, the assignment requires that:
Your program should not continuously send requests to wiki. Your program
must wait for at least 1 second after every 10 requests
So my question is, where exactly in the above code is the "request" being sent? And how does this connection work? Is the entire webpage being loaded in one go? or is it being downloaded line by line?
I honestly don't really understand much about networking at all so apologies if I'm misunderstanding something fundamental. Any help would be much appreciated.
InputStream is = url.openStream();
at the above line you will be sending request
BufferedReader br = new BufferedReader(new InputStreamReader(is));
at this line getting the input stream and reading.
Calling url.openStream() initiates a new TCP connection to the server that the URL resolves to. An HTTP GET request is then sent over the connection. If all goes right (i.e., 200 OK), the server sends back the HTTP response message that carries the data payload that is served up at the specified URL. You then need to read the bytes from the InputStream that the openStream() method returns in order to retrieve the data payload into your program.
I'm trying to write a httpserver using sockets and I meet this problem.
As everyone knows , a Http request is like this.
GET /index.html HTTP/1.1
Cache-Control: max-age=0
Host: 127.0.0.1
Accept:xxxxx
User-Agent: xxxx
Connection: keep-alive
CRLF
This is message body!
The question is how can I get full Http request including message body.
I tried to write like this.
ServerSocket serverSocket = new ServerSocket(8000);
while (true) {
Socket socket = serverSocket.accept();
new Thread() {
{
InputStream is = socket.getInputStream();
BufferedReader input = new BufferedReader(new InputStreamReader(is));
String line = null;
while ((line = input.readLine()) != null) {
System.out.println(line);
}
System.out.print("finish");
}
}.start();
}
And the console would never print "finish".Then I changed like this
ServerSocket serverSocket = new ServerSocket(8000);
while (true) {
Socket socket = serverSocket.accept();
new Thread() {
{
InputStream is = socket.getInputStream();
BufferedReader input = new BufferedReader(new InputStreamReader(is));
String line = null;
while (input.ready()) {
line = input.readLine();
System.out.println(line);
}
System.out.println("finish");
}
}.start();
}
Things go to be better, We can see "finish"! But if I refresh the page a little bit faster.The bufferdreader will not be ready and don't get in the while{} !
I want to print all the rerquest and "finish"
Please help me.
Thanks a lot!!
Both your approaches are incorrect.
In the first one, input.readLine() will return null only when the end of the stream has been reached, not when the request ended. That means that you'll loop there as long as the browser maintains the TCP connection open. That might take a while. Plus, multiple requests might be sent on the same connection, so you might end up printing all of them (I don't know if that's what you want to do).
In the second one, you have timing problem. input.ready() checks whether the receive buffer has any content to read, instead of checking whether the request ended. So you might end up printing only a part of the request instead of waiting for the whole thing. With this approach and the right timings, you might end up printing a part of a request, multiple requests, or anything in between (like a request and a half).
Also note that HTTP GET messages almost never carry any payalod, and no browser will requests like the one in your example.
I am wondering how to properly manage the following HTTP response considering the file I am sending to the client contains other linked files representing future HTTP requests.
I know that I can close the PrintWriter which will indicate to the client that the body is finished, but if I do that I don't see how I can receive subsequent requests for the linked pages within "first.html". I tried to include the content-length header but it seems I may have calculated the length incorrectly as any attempt to read from the input stream after sending "first.html" block/stall. Which tells me the client doesn't realize that the first.html file has finished sending. I've read over RFC 2616 but frankly have trouble understanding it without a proper example. I'm a real child when it comes to protocols, so any help would be appreciated!
public static void main(String[] args) throws Exception{
ServerSocket serverSocket = new ServerSocket(80);
Socket clientSocket = serverSocket.accept();
BufferedReader in = new BufferedReader(new InputStreamReader(clientSocket.getInputStream()));
PrintWriter out = new PrintWriter(clientSocket.getOutputStream(),true);
String s;
while (!(s = in.readLine()).isEmpty()) {
System.out.println(s);
}
out.write("HTTP/1.0 200 OK\r\n");
out.write("Content-Type: text/html\r\n");
out.write("Content-Length: 1792\r\n");
out.write("\r\n");
File html = new File("/Users/tru/Documents/JavaScript/first.html");
BufferedReader htmlreader = new BufferedReader(new InputStreamReader(new FileInputStream(html)));
int c;
while((c = htmlreader.read()) > 0){
out.write(c);
}
}