Slow reading of HTTP response using InputStreamReader and BufferedReader? - java

I have the following code in Java that sends an HTTP request to a web server and read the response:
StringBuilder response = new StringBuilder(50000);
URL url2 = new URL(ServiceURL);
connection = (HttpURLConnection)url2.openConnection();
connection.setRequestMethod("POST");
//... (some more connection settings) ...
BufferedWriter wr = new BufferedWriter(new OutputStreamWriter(connection.getOutputStream(), "UTF-8"));
wr.write(Request);
wr.flush ();
wr.close ();
InputStream is = connection.getInputStream();
BufferedReader rd = new BufferedReader(new InputStreamReader(is));
int i = 0;
while ((i = rd.read()) > 0) {
response.append((char)i);
}
It works for most cases, but I have a problem with one server that returns a rather large XML (something like 500KB; I guess this is pretty large for just a bunch of text..), where I keep getting a read timeout exception.
I believe it's not a network problem, because I've tried making the same request using curl and the response just arrived all right and pretty quick, something like two seconds.
When I look what's going on in the network (using wireshark to capture the packets), I noticed that the TCP receive window in my computer gets full at some point. The TCP stack sometimes survives this; I can see the server sending TCP keep-alive to keep the connection up, but in the end the TCP connection just breaks down.
Could it be that the reading part of the code (appending the received response character-by-character) is slowing my code down? Is there a more efficient way to read an HTTP response?

Reading character by character is quite slow, yes. Try reading chunks at a time into a buffer:
char[] buf = new char[2048];
int charsRead;
while((charsRead = rd.read(buf, 0, 2048)) > 0) {
response.append(buf, 0, charsRead);
}

As Phil already said reading the stream byte by byte is kinda slow. I prefer using the readLine() method of BufferedReader :
StringBuilder response = new StringBuilder();
String line = "";
while((line = rd.readLine()) != null) {
response.append(line + System.getProperty("line.separator");
}
If possible, I would consider using the Apache HTTP Client library. It is easy to use and very powerful in handling HTTP stuff.
http://hc.apache.org/httpcomponents-client-ga/
You should also keep in mind to set the socket and connection timeouts. This way you can control how long a connection is kept open (alt least on you side of the connection).
And last but not least always close your HTTP connections in a finally block after you received the response, otherwise you may run into a too many open files problem.
Hope this heps ;)

Related

Java SSLSocket Outputstream message breaks

I have a problem, where a message I write to the OutputStream of a SSLSocket is send in multiple parts instead of one packet.
So, I have a SSLSocket:
final Socket newSocket = SSLSocketFactory.getDefault ().createSocket (server, port);
Since I connect to a special service, I manually send a non HTTP-standard connect header, which works just fine:
final OutputStream outputStream = newSocket.getOutputStream ();
outputStream.write (connect.getBytes ());
outputStream.flush ();
newSocket.setKeepAlive (true);
I then wait for a response, until I get a newline in it:
final InputStream inputStream = newSocket.getInputStream ();
final InputStreamReader inputStreamReader = new InputStreamReader (inputStream, "ISO-8859-1");
final BufferedReader bufferedReader = new BufferedReader (inputStreamReader);
I finish reading when I received a HTTP/1.1 200 Ok in a loop reading from the BufferedReader. Again, until this point everything is fine.
Then I begin using this socket to send messages from my own protocol like this:
outputStream.write (message.getBytes ());
outputStream.flush ();
When I now run a tcpdump on the target device, I can see that the socket connection is established and in fact two packets are send. Assuming the message would be Hello World, there is one packet which is marked as malformed, only containing the H. Then another one follows with the content ello World. No Exception is thrown during the run of this program.
Would could I possibly have done or missed to cause this kind of behaviour?
There are reasons for that I can not give you the complete code, but this question already contains all the significant passages.
Update:
As proposed in the comments, I tried to disable the Nagle algorithm using newSocket.setTcpNoDelay(true); just after creating the socket. But still, the packets sent are fragmented.

Java socket read blocking infinitely

I have a really strange issue while working with Java sockets. This problem is only happening for a VERY small subset of the urls that I am processing. Let's call an example url abc.com.
Edit: url is lists.wikimedia.org/robots.txt that gives me problems.
I can curl/netcat/telnet lists.wikimedia.org with path /robots.txt perfectly fine. Telnet even tells me the IP address for lists.wikimedia.org (see below). However, when I try to do the same using Java socket like the following:
Socket s = new Socket("208.80.154.4", 80); // IP is same as the IP printed by telnet
BufferedWriter writer = new BufferedWriter(s.getOutputStream());
writer.println("HEAD /robots.txt HTTP/1.1");
writer.println("Host: lists.wikimedia.org");
writer.println("Connection: Keep-Alive");
writer.flush();
InputStreamReader r = new InputStreamReader(s.getInputStream());
BufferedReader reader = new BufferedReader(r);
String line;
while ((line = reader.readLine()) != null) {
...
}
The readLine blocks infinitely until the socket times out...
Does anyone have ANY idea why this might be happening? The same code works fine with most of the other URLs, and interestingly enough this bug only happens for some of the ROBOTS.TXT requests... I'm so confused why this might be happening.
Edit:
Interestingly enough, using apache HttpClient library gives me the correct result for lists.wikimedia.org/robots.txt. Is there something else I need to do if I want to manually do it via Socket?
Probably you are missing the additional CRLF to end the HTTP request header. I also would write them explicitly, to avoid platform confusions, like so (untested):
writer.print("HEAD /robots.txt HTTP/1.1\r\n");
writer.print("Host: lists.wikimedia.org\r\n");
writer.print("Connection: Keep-Alive\r\n");
writer.print("\r\n");
writer.flush();
also consider using a HTTPURLConnection instead of plain sockets, takes away all this burdons:
HttpURLConnection connection = (HttpURLConnection) new URL(url).openConnection();
connection.setRequestMethod("HEAD");
...

HttpUrlConnection#setReadTimeout has no effect when POSTing large messages

I have a problem where an external HTTP server that I need to POST large messages to is having OutOfMemory issues. My HTTP client code is not timing out.
It is possible to reproduce this behaviour by using kill -STOP to pause the HTTP server process (to undo, use kill -CONT ).
I have found using the code below that if I keep my request small that the entire message is written to the output stream and the getResponseCode times out.
With a large message like the one below, the code ties up in the write to the output stream. I presume that I have filled the socket buffer. The code then never times out.
What I am looking for is a way of controlling the timeout when writing the request.
I have tried something similar using Apache HttpClient and got a similar result.
I tried running the Java code below in a different thread and interrupting it myself but the thread stays running.
I need to keep the streaming behaviour but I would appreciate any ideas into how I might be able to get the client code to time out.
Thanks,
PJ
URL url = new URL("http://unresponsive/path");
HttpURLConnection conn = (HttpURLConnection)url.openConnection();
conn.setDoInput(true);
conn.setDoOutput(true);
conn.setUseCaches(false);
conn.setConnectTimeout(10000);
conn.setFixedLengthStreamingMode(4 * 1000000);
conn.setRequestProperty("Content-Length", "4000000");
conn.setReadTimeout(10000);
conn.setRequestMethod("POST");
OutputStream os = conn.getOutputStream();
for(int i = 0; i < 1000000; i++) {
if(i % 1000 == 0) {
System.out.println("write: " + i);
}
os.write("test".getBytes("us-ascii"));
}
os.close();
System.out.println("response-code: " + conn.getResponseCode());
InputStream is = conn.getInputStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);
String line;
while((line = br.readLine()) != null) {
System.out.println(line);
}
is.close();
it appears that you are opening a cnnection and writing to the output stream.... I think the confusion is the role of reading vs writing.... Youre not reading from an input stream, when your code is hanging ... so the timeout won't have any effect to rescue the tie up ..
If there is a way to timeout the writing, your code can be fixed that way.

How do I recognize EOF in Java Sockets?

I want to recognize end of data stream in Java Sockets. When I run the code below, it just stuck and keeps running (it stucks at value 10).
I also want the program to download binary files, but the last byte is always distinct, so I don't know how to stop the while (pragmatically).
String host = "example.com";
String path = "/";
Socket connection = new Socket(host, 80);
PrintWriter out = new PrintWriter(connection.getOutputStream());
out.write("GET "+ path +" HTTP/1.1\r\nHost: "+ host +"\r\n\r\n");
out.flush();
int dataBuffer;
while ((dataBuffer = connection.getInputStream().read()) != -1)
System.out.println(dataBuffer);
out.close();
Thanks for any hints.
Actually your code is not correct.
In HTTP 1.0 each connection is closed and as a result the client could detect when an input has ended.
In HTTP 1.1 with persistent connections, the underlying TCP connection remains open, so a client can detect when an input ends with 1 of the following 2 ways:
1) The HTTP Server puts a Content-Length header indicating the size of the response. This can be used by the client to understand when the reponse has been fully read.
2)The response is send in Chunked-Encoding meaning that it comes in chunks prefixed with the size of each chunk. The client using this information can construct the response from the chunks received by the server.
You should be using an HTTP Client library since implementing a generic HTTP client is not trivial (at all I may say).
To be specific in your code posted you should have followed one of the above approaches.
Additionally you should read in lines, since HTTP is a line terminated protocol.
I.e. something like:
BufferedReader in =new BufferedReader(new InputStreamReader( Connection.getInputStream() ) );
String s=null;
while ( (s=in.readLine()) != null) {
//Read HTTP header
if (s.isEmpty()) break;//No more headers
}
}
By sending a Connection: close as suggested by khachik, gets the job done (since the closing of the connection helps detect the end of input) but the performance gets worse because for each request you start a new connection.
It depends of course on what you are trying to do (if you care or not)
You should use existing libraries for HTTP. See here.
Your code works as expected. The server doesn't close the connection, and dataBuffer never becomes -1. This happens because connections are kept alive in HTTP 1.1 by default. Use HTTP 1.0, or put Connection: close header in your request.
For example:
out.write("GET "+ path +" HTTP/1.1\r\nHost: "+ host +"\r\nConnection: close\r\n\r\n");
out.flush();
int dataBuffer;
while ((dataBuffer = connection.getInputStream().read()) != -1)
System.out.print((char)dataBuffer);
out.close();

Java socket programming - stream get stuck

I am currently working on a simple proxy server, which receives http request from browser, process it, then forward it to the desire web server.
I try to get the request from the input stream of the socket connected by the browser, everything is fine except that the stream get stuck after receiving the last block of data.
My code is in fact very simple, as shown below:
ServerSocket servSocket = new ServerSocket(8282);
Socket workSocket = servSocket.accept();
InputStream inStream = workSocket.getInputStream();
byte[] buffer = new byte[1024];
int numberRead = 0;
while ((numberRead = inStream.read(buffer, 0, 1024)) != -1){
System.out.println(new String(buffer));
}
The loop simply cannot exit, even the request reception is finished.
Is there any method to workaround this problem?
Thanks in advance for any advice.
As in InputStream javadoc the method will block until the data is available or the EOF is encountered. So, the other side of Socket needs to close it - then the inStream.read() call will return.
Another method is to send the size of message you want to read first, so you know ahead how many bytes you have to read. Or you can use BufferedReader to read from socket in line-wise way. BufferedReader has a method readLine() which returns every time a line is read, which should work for you as HTTP protocol packages are nice divided into lines.
It will cycle until the connection is closed, and the client is probably waiting for HTTP response from you and doesn't close it.
The browser is waiting for a response before it closes the connection.
Your read-method on the other hand will block until the stream/connection is closed or new data is received.
Not a direct solution according to your current code.
As HTTP is a line based protocol, you might want to use a Buffered Reader and call readLine() on it.
The when a http request comes in it will always be concluded with a blank line, for example:
GET /someFile.html HTTP/1.1
Host: www.asdf.com
After sending that request the client connection will then wait for a response from the server before closing the connection. So if you want to parse the request from the user you are probably better off using a BufferedReader and reading full lines until you reach a lines of text that is blank line.

Categories

Resources