I have an assignment for school that involves writing a simple web crawler that crawls Wikipedia. The assignment stipulates that I can't use any external libraries so I've been playing around with the java.net.URL class. Based on the official tutorial and some code given by my professor I have:
public static void main(String[] args) {
System.setProperty("sun.net.client.defaultConnectTimeout", "500");
System.setProperty("sun.net.client.defaultReadTimeout", "1000");
try {
URL url = new URL(BASE_URL + "/wiki/Physics");
InputStream is = url.openStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String inputLine;
int lineNum = 0;
while ((inputLine = br.readLine()) != null && lineNum < 10) {
System.out.println(inputLine);
lineNum++;
}
is.close();
}
catch (MalformedURLException e) {
System.out.println(e.getMessage());
}
catch (IOException e) {
System.out.println(e.getMessage());
}
}
In addition, the assignment requires that:
Your program should not continuously send requests to wiki. Your program
must wait for at least 1 second after every 10 requests
So my question is, where exactly in the above code is the "request" being sent? And how does this connection work? Is the entire webpage being loaded in one go? or is it being downloaded line by line?
I honestly don't really understand much about networking at all so apologies if I'm misunderstanding something fundamental. Any help would be much appreciated.
InputStream is = url.openStream();
at the above line you will be sending request
BufferedReader br = new BufferedReader(new InputStreamReader(is));
at this line getting the input stream and reading.
Calling url.openStream() initiates a new TCP connection to the server that the URL resolves to. An HTTP GET request is then sent over the connection. If all goes right (i.e., 200 OK), the server sends back the HTTP response message that carries the data payload that is served up at the specified URL. You then need to read the bytes from the InputStream that the openStream() method returns in order to retrieve the data payload into your program.
Related
I am seeing this exception:
java.net.SocketTimeoutException: Timeout attempting to read data from the socket
Here's the code generating it:
public static String extractBody(HttpServletRequest request) {
StringBuffer sb = new StringBuffer();
String line = null;
try {
//BufferedReader reader = request.getReader();
BufferedReader reader = new BufferedReader(new InputStreamReader(request.getInputStream()));
while ((line = reader.readLine()) != null) {
sb.append(line);
}
} catch (Exception e) {
logger.fatal("Failed to read from socket with content-length: {}", request.getHeader("Content-Length"));
e.printStackTrace();
}
return sb.toString();
}
When it happens, that content-length that's written is non-zero. It's like this
Failed to read from socket with content-length: 279645
What is causing this timeout? Is it that the socket was left unclosed by the client? Is there something else I am missing? Is there a different way I should be reading the body data from a servlet request? Most of the requests work fine, I only see this error sometimes but it may be a certain client version or platform or something.
The content-length header didn't match the actual content length and it was waiting for more data.
I am wondering how to properly manage the following HTTP response considering the file I am sending to the client contains other linked files representing future HTTP requests.
I know that I can close the PrintWriter which will indicate to the client that the body is finished, but if I do that I don't see how I can receive subsequent requests for the linked pages within "first.html". I tried to include the content-length header but it seems I may have calculated the length incorrectly as any attempt to read from the input stream after sending "first.html" block/stall. Which tells me the client doesn't realize that the first.html file has finished sending. I've read over RFC 2616 but frankly have trouble understanding it without a proper example. I'm a real child when it comes to protocols, so any help would be appreciated!
public static void main(String[] args) throws Exception{
ServerSocket serverSocket = new ServerSocket(80);
Socket clientSocket = serverSocket.accept();
BufferedReader in = new BufferedReader(new InputStreamReader(clientSocket.getInputStream()));
PrintWriter out = new PrintWriter(clientSocket.getOutputStream(),true);
String s;
while (!(s = in.readLine()).isEmpty()) {
System.out.println(s);
}
out.write("HTTP/1.0 200 OK\r\n");
out.write("Content-Type: text/html\r\n");
out.write("Content-Length: 1792\r\n");
out.write("\r\n");
File html = new File("/Users/tru/Documents/JavaScript/first.html");
BufferedReader htmlreader = new BufferedReader(new InputStreamReader(new FileInputStream(html)));
int c;
while((c = htmlreader.read()) > 0){
out.write(c);
}
}
i'm using websocket to send data, this is the code (javascript)
socket= new WebSocket('ws://localhost:10302/socket');
socket.onopen= function() {
socket.send('delete structure'+c);
}
in the server side i'm using java and this is the code
try {
standardiste = new ServerSocket(10302);
while(true) {
System.out.println("listening data from socket");
socket = standardiste.accept();
try {
BufferedReader entree = new BufferedReader(new InputStreamReader(socket.getInputStream()));
while(entree!=null)
{
System.out.println(entree.readLine());
}
}
catch(IOException exc) {
}
socket.close();
}
}
i want to read the data sended ?
What you need is
String line;
while((line = entree.readLine()) != null){
System.out.println(line);
}
What you were trying was to tie in the BufferedReader into the Socket, but never read anything from it. That's where the BufferedReader.readLine() method comes in, which reads a single line (until it reaches an endline character) from the buffer.
By comparing the current line to null (readLine() != null) you keep reading until it stops receiving the end of a transmission.
Edit:
I'm afraid the WebSocket protocol is different from the Java Socket protocol, hence it receives just the headers, but doesn't recognize any of the actual data that is being sent, simple because the protocols don't match up. Try using the Java WebSocket class. Here is a good tutorial.
I'm trying to replace a Netcat command that I'm running in my terminal that will reset some data on a server. The netcat command looks like this:
echo '{"id":1, "method":"object.deleteAll", "params":["subscriber"]} ' | nc x.x.x.x 3994
I've been trying to implement it in Java since I would like to be able to call this command from an application I'm developing. I'm having issues with it though, the command is never executed on the server.
This is my java code:
try {
Socket socket = new Socket("x.x.x.x", 3994);
String string = "{\"id\":1,\"method\":\"object.deleteAll\",\"params\":[\"subscriber\"]}";
DataInputStream is = new DataInputStream(socket.getInputStream());
DataOutputStream os = new DataOutputStream(socket.getOutputStream());
os.write(string.getBytes());
os.flush();
BufferedReader in = new BufferedReader(new InputStreamReader(is));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
is.close();
os.close();
} catch (IOException e) {
e.printStackTrace();
}
The code also hangs on the while loop that should read the InputStream, I have no idea why. I've been using Wireshark to capture the packets and the data that is going out looks the same:
{"id":1,"method":"object.deleteAll","params":["subscriber"]}
Perhaps the rest of the packets are not shaped in the same way but I really can't understand why that would be. Perhaps I am writing the string in a faulty way to the OutputStream? I have no idea :(
Note that I posted a question similar to this yesterday when I didn't properly understand the problem:
Can't post JSON to server with HTTP Client in Java
EDIT:
These are the possible results I get from running the nc command, I would expect to get the same messages to the InputStream if the OutputStream sends correct data in a correct way:
Wrong arguments:
{"id":1,"error":{"code":-32602,"message":"Invalid entity type: subscribe"}}
Ok, successful:
{"id":1,"result":100}
Nothing to delete:
{"id":1,"result":0}
Wow, I really had no idea. I experimented with some different Writers like "buffered writer" and "print writer" and it seems the PrintWriter was the solution. Although I couldn't use the PrintWriter.write() nor the PrintWriter.print() methods. I had to use PrintWriter.println().
If someone has the answer to why other writers wouldn't work and explain how they would impact the data sent to the server I will gladly accept that as the solution.
try {
Socket socket = new Socket(InetAddress.getByName("x.x.x.x"), 3994);
String string = "{\"id\":1,\"method\":\"object.deleteAll\",\"params\":[\"subscriber\"]}";
DataInputStream is = new DataInputStream(socket.getInputStream());
DataOutputStream os = new DataOutputStream(socket.getOutputStream());
PrintWriter pw = new PrintWriter(os);
pw.println(string);
pw.flush();
BufferedReader in = new BufferedReader(new InputStreamReader(is));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
is.close();
os.close();
} catch (IOException e) {
e.printStackTrace();
}
I think the server is expecting newline at the end of the message. Try to use your original code with write() and add \n at the end to confirm this.
On Servlet side:
for (GameParticipant activePlayer : connector.activePlayers) {
activePlayer.out.println(response);
activePlayer.out.flush();
System.out.println("Server sending board state to all game participants:" + response);
(activePlayer.out is a PrintWriter saved in the server from the HttpResponse object obtained when that client connected the first time)
On clinet side:
private void receiveMessageFromServer() {
try {
BufferedReader br = new BufferedReader(new InputStreamReader(conn.getInputStream()));
StringBuilder sb = new StringBuilder();
String input = null;
while ((input = br.readLine()) != null){
sb.append(input).append(" ");
}
}
For some reason, this communication works only the first time, when the client requests connection and waits for response in the same method, while the server uses the PrintWriter obtained directly fron the available HttpRespnse in the doPost method. After that, when the servlet tries to reuse the PrintWriter to talk to the clinet outside of a doPost method, nothing happens, the message never gets to the client. Any ideas?
P.S. In client constructor:
try {
url = new URL("http://localhost:8182/stream");
conn = (HttpURLConnection) url.openConnection();
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException ioE) {
ioE.printStackTrace();
}
The response output stream isn't valid outside the doPost() method, or more properly speaking the service() method. It can only be used to send one response. However PrintWriter swallows exceptions, as you will find when you check its error status, so you didn't see the problem.
In other words your entire server-side design is flawed. You can't misuse the Servlet Specification in that way.