servlet socket read timeout via reader - java

I am seeing this exception:
java.net.SocketTimeoutException: Timeout attempting to read data from the socket
Here's the code generating it:
public static String extractBody(HttpServletRequest request) {
StringBuffer sb = new StringBuffer();
String line = null;
try {
//BufferedReader reader = request.getReader();
BufferedReader reader = new BufferedReader(new InputStreamReader(request.getInputStream()));
while ((line = reader.readLine()) != null) {
sb.append(line);
}
} catch (Exception e) {
logger.fatal("Failed to read from socket with content-length: {}", request.getHeader("Content-Length"));
e.printStackTrace();
}
return sb.toString();
}
When it happens, that content-length that's written is non-zero. It's like this
Failed to read from socket with content-length: 279645
What is causing this timeout? Is it that the socket was left unclosed by the client? Is there something else I am missing? Is there a different way I should be reading the body data from a servlet request? Most of the requests work fine, I only see this error sometimes but it may be a certain client version or platform or something.

The content-length header didn't match the actual content length and it was waiting for more data.

Related

Http response code 429 while reading HTML

In java I want to read and save all the HTML from an URL(instagram), but getting Error 429 (Too many request). I think it is because I am trying to read more lines than request limits.
StringBuilder contentBuilder = new StringBuilder();
try {
URL url = new URL("https://www.instagram.com/username");
URLConnection con = url.openConnection();
InputStream is =con.getInputStream();
BufferedReader in = new BufferedReader(new InputStreamReader(is));
String str;
while ((str = in.readLine()) != null) {
contentBuilder.append(str);
}
in.close();
} catch (IOException e) {
log.warn("Could not connect", e);
}
String html = contentBuilder.toString();
And the Error is so;
Could not connect
java.io.IOException: Server returned HTTP response code: 429 for URL: https://www.instagram.com/username/
And it shows also that error occurs because of this line
InputStream is =con.getInputStream();
Does anybody have an idea why I get this error and/or what to do to solve it?
The problem might have been caused by the connection not being closed/disconnected.
For the input try-with-resources for automatic closing, even on exception or return is usefull too. Also you constructed an InputStreamReader that would use the default encoding of the machine where the application would run, but you need the charset of the URL's content.
readLine returns the line without line-endings (which in general is very useful). So add one.
StringBuilder contentBuilder = new StringBuilder();
try {
URL url = new URL("https://www.instagram.com/username");
URLConnection con = url.openConnection();
try (BufferedReader in = new BufferedReader(
new InputStreamReader(con.getInputStream(), "UTF-8"))) {
String line;
while ((line = in.readLine()) != null) {
contentBuilder.append(line).append("\r\n");
}
} finally {
con.disconnect();
} // Closes in.
} catch (IOException e) {
log.warn("Could not connect", e);
}
String html = contentBuilder.toString();

BufferedReader is not reading body of request

I am trying to read the data from an HttpPost, but when I read the data from BufferedReader I only get the header info. Please see the code below.
Here is the server
try {
ServerSocket server = new ServerSocket(8332);
System.out.println("Listening for connection on port 8332 ....");
while (true) {
try (Socket socket = server.accept()) {
BufferedReader buffer = new BufferedReader(new InputStreamReader(socket.getInputStream()));
String request = "";
String line;
while ((line = buffer.readLine()) != null) {
System.out.println(line);
request += line;
}
System.out.print("port 8332 reading: " + request);
} catch (IOException e) {
System.out.print(e.getMessage());
}
}
} catch (IOException e){
System.out.print(e.getMessage());
}
Here is the Client
HttpClient client = HttpClientBuilder.create().build();
HttpPost post = new HttpPost("http://localhost:8332");
try {
StringEntity params =new StringEntity("details={\"name\":\"myname\",\"age\":\"20\"} ");
post.addHeader("content-type", "application/x-www-form-urlencoded");
post.setEntity(params);
client.execute(post);
} catch (IOException e){
System.out.println(e);
}
When I run this program I just get the following output
Listening for connection on port 8332 ....
POST / HTTP/1.1
content-type: application/x-www-form-urlencoded
Content-Length: 37
Host: localhost:8332
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.5.6 (Java/1.8.0_131)
Accept-Encoding: gzip,deflate
Upon debugging it seems like the program is not exiting this while loop
while ((line = buffer.readLine()) != null) {
System.out.println(line);
request += line;
}
But I can't figure out why. Please help I have been stuck all day.
Thanks in advance
But I can't figure out why.
The only way that your server will get a null from buffer.readLine() is if the client closes its socket output stream.
The problem is that the client side is trying to keep the connection alive ... which means that it won't close its socket output stream. That means that the server needs to respect the "content-length" header; i.e. count the number of bytes read rather than looking for an end-of-stream.
Fundamentally, your server side is not implementing the HTTP 1.1 specification correctly.
What to do?
Well my advice is to not try to implement HTTP starting from sockets. Use and existing framework ... or the Apache HttpComponents library.
But if you insist on doing it this way1, read the HTTP specification thoroughly before you start trying to implement it. And consult the spec whenever you run into problems with your implementation to check that you are doing the right thing.
1 - Definition: Masochism - the enjoyment of an activity that appears to be painful or tedious.
Use !=0 instead of !=null in your while loop:
while((line = buffer.readLine()).length() !=0) {
output:
port 8332 reading: POST / HTTP/1.1content-type:
application/x-www-form-urlencodedContent-Length: 18Host:
localhost:8332Connection: Keep-AliveUser-Agent:
Apache-HttpClient/4.5.3 (Java/1.8.0_171)Accept-Encoding:
gzip,deflateorg.apache.http.NoHttpResponseException: localhost:8332
failed to respond

Reading from a URL in java: when is a request actually sent?

I have an assignment for school that involves writing a simple web crawler that crawls Wikipedia. The assignment stipulates that I can't use any external libraries so I've been playing around with the java.net.URL class. Based on the official tutorial and some code given by my professor I have:
public static void main(String[] args) {
System.setProperty("sun.net.client.defaultConnectTimeout", "500");
System.setProperty("sun.net.client.defaultReadTimeout", "1000");
try {
URL url = new URL(BASE_URL + "/wiki/Physics");
InputStream is = url.openStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String inputLine;
int lineNum = 0;
while ((inputLine = br.readLine()) != null && lineNum < 10) {
System.out.println(inputLine);
lineNum++;
}
is.close();
}
catch (MalformedURLException e) {
System.out.println(e.getMessage());
}
catch (IOException e) {
System.out.println(e.getMessage());
}
}
In addition, the assignment requires that:
Your program should not continuously send requests to wiki. Your program
must wait for at least 1 second after every 10 requests
So my question is, where exactly in the above code is the "request" being sent? And how does this connection work? Is the entire webpage being loaded in one go? or is it being downloaded line by line?
I honestly don't really understand much about networking at all so apologies if I'm misunderstanding something fundamental. Any help would be much appreciated.
InputStream is = url.openStream();
at the above line you will be sending request
BufferedReader br = new BufferedReader(new InputStreamReader(is));
at this line getting the input stream and reading.
Calling url.openStream() initiates a new TCP connection to the server that the URL resolves to. An HTTP GET request is then sent over the connection. If all goes right (i.e., 200 OK), the server sends back the HTTP response message that carries the data payload that is served up at the specified URL. You then need to read the bytes from the InputStream that the openStream() method returns in order to retrieve the data payload into your program.

read socket data with java

i'm using websocket to send data, this is the code (javascript)
socket= new WebSocket('ws://localhost:10302/socket');
socket.onopen= function() {
socket.send('delete structure'+c);
}
in the server side i'm using java and this is the code
try {
standardiste = new ServerSocket(10302);
while(true) {
System.out.println("listening data from socket");
socket = standardiste.accept();
try {
BufferedReader entree = new BufferedReader(new InputStreamReader(socket.getInputStream()));
while(entree!=null)
{
System.out.println(entree.readLine());
}
}
catch(IOException exc) {
}
socket.close();
}
}
i want to read the data sended ?
What you need is
String line;
while((line = entree.readLine()) != null){
System.out.println(line);
}
What you were trying was to tie in the BufferedReader into the Socket, but never read anything from it. That's where the BufferedReader.readLine() method comes in, which reads a single line (until it reaches an endline character) from the buffer.
By comparing the current line to null (readLine() != null) you keep reading until it stops receiving the end of a transmission.
Edit:
I'm afraid the WebSocket protocol is different from the Java Socket protocol, hence it receives just the headers, but doesn't recognize any of the actual data that is being sent, simple because the protocols don't match up. Try using the Java WebSocket class. Here is a good tutorial.

How to read compressed HTML page with Content-Encoding : gzip

I request a web page that sends a Content-Encoding: gzip header, but got stuck how to read it..
My code:
try {
URLConnection connection = new URL("http://jquery.org").openConnection();
String html = "";
BufferedReader in = null;
connection.setReadTimeout(10000);
in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null){
html+=inputLine+"\n";
}
in.close();
System.out.println(html);
System.exit(0);
} catch (IOException ex) {
Logger.getLogger(Crawler.class.getName()).log(Level.SEVERE, null, ex);
}
The output looks very messy.. (I was unable to paste it here, a sort of symbols..)
I believe this is a compressed content, how to parse it?
Note:
If I change jquery.org to jquery.com (which don't send that header, my code works well)
Actually, this is pb2q's answer, but I post the full code for future readers
try {
URLConnection connection = new URL("http://jquery.org").openConnection();
String html = "";
BufferedReader in = null;
connection.setReadTimeout(10000);
//The changed part
if (connection.getHeaderField("Content-Encoding")!=null && connection.getHeaderField("Content-Encoding").equals("gzip")){
in = new BufferedReader(new InputStreamReader(new GZIPInputStream(connection.getInputStream())));
} else {
in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
}
//End
String inputLine;
while ((inputLine = in.readLine()) != null){
html+=inputLine+"\n";
}
in.close();
System.out.println(html);
System.exit(0);
} catch (IOException ex) {
Logger.getLogger(Crawler.class.getName()).log(Level.SEVERE, null, ex);
}
There is a class for this: GZIPInputStream. It is an InputStream and so is very transparent to use.
there are two cases with Content-Encoding:gzip header
if data already compressed(by application), Content-Encoding:gizp header will cause data to compressed again.so its double compressed.it's because http compression
if data is not compressed by application, Content-Encoding:gizp will cause data to compress(gzip mostly) and it will automatically uncompressed(un-zip) before it reaches to client. un-zip is default feature available in most of web browsers. browser will do un-zip if it finds Content-Encoding:gizp header in the response.

Categories

Resources