My code needs to download a big xml file (500MB) inside a GZIPInputStream and process it doing some operations for every object. Those operations take time to be completed, and I have many objects to process. I'm using commons http-client 3.1 and stax.
public void download(String url) throws HttpException, IOException,
XMLStreamException, FactoryConfigurationError {
GetMethod getMethod = new GetMethod(url);
try {
httpClient.executeMethod(getMethod);
Header contentEncoding = getMethod.getResponseHeader("Content-Encoding");
if (contentEncoding != null) {
String acceptEncodingValue = contentEncoding.getValue();
if (acceptEncodingValue.indexOf("gzip") != -1) {
processStream(new GZIPInputStream(getMethod.getResponseBodyAsStream()));
return;
}
}
processStream(getMethod.getResponseBodyAsStream());
return;
} finally {
getMethod.releaseConnection();
}
}
protected void processStream(InputStream inputStream) throws XMLStreamException, FactoryConfigurationError {
XMLStreamReader xmlStreamReader = XMLInputFactory.newFactory().createXMLStreamReader(inputStream);
//parses xml with Stax
//executes some long operations for each object
}
When I run the code it works till, after two or three hours, I get a SocketException: Connection reset.
Looks like the server has closed the connection, is it correct? Is there a way to avoid this error without any change on server-side? If not, how can I deal with it to avoid re-running my application from the beginning?
com.ctc.wstx.exc.WstxIOException: Connection reset
at com.ctc.wstx.sr.StreamScanner.throwFromIOE(StreamScanner.java:708)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1086)
.................
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:168)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:182)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at org.apache.commons.httpclient.AutoCloseInputStream.read(AutoCloseInputStream.java:108)
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:221)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:141)
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:92)
at java.io.FilterInputStream.read(FilterInputStream.java:90)
at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365)
at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110)
at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:1034)
at com.ctc.wstx.sr.StreamScanner.getNextChar(StreamScanner.java:794)
at com.ctc.wstx.sr.BasicStreamReader.parseNormalizedAttrValue(BasicStreamReader.java:1900)
at com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:3037)
at com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:2936)
at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2848)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
One suggestion would be to cache the file locally and then process it later.
ie. your handler simply reads the stream and writes it to a temp file on disk. Then it closes the stream and processes the data from the temp file.
This is probably a good approach anyway as, even if you can keep the link up, the possibilities of some network outage, reduced QoS and so on may make retrieving the file unreliable. You might also be preventing the server from updating it for the entire duration of your processing, which is a bit anti-social.
If you cannot copy the xml to you local computer try to see if the connection timed out. Maybe the processing of the xml takes too long and the connection gets reset by one of the intermediate servers
Related
Learning about HTTP requests in Java. I'd like to know if reading the response body is essential to keeping a connection alive.
Here's an example code block (which posts a message to some URL):
private void writeToConnection(String url, String msg) throws IOException {
try {
HttpURLConnection connection = open(url);
// "Try with resources"
try (BufferedWriter writer = new BufferedWriter(
new OutputStreamWriter(connection.getOutputStream()))) {
writer.write(msg);
}
// Why do I need this line?
IOUtils.readStringFromStream(connection.getInputStream());
int code = connection.getResponseCode();
System.out.println(String.format("Returned response code %d.", code));
} catch (IOException e) {
e.printStackTrace();
}
}
Why is it necessary to read the input stream? The method readStringFromStream is returning a string but the string is not being assigned to anything. Does this ensure the connection stays alive? If so, how does the connection stay alive when the first line in the method opens a new connection? If the next batch of data to be written invokes this method, wouldn't that discard the old connection and open a new one?
I believe the intent of this code is indeed to consume the response body so that the connection can be reused. However, I'm not sure that this approach is correct; it is also likely to depend on the version of Java you are using.
First, it should suffice to get the connection's InputStream and close it; behind the scenes, the body still needs to be read, but closing the stream signals to the connection handler that the application wants to skip the body, and the handler can read and discard the content before putting the connection into a cache for re-use.
However, depending on the status, there could be an error stream instead of an input stream. Even in this case, the body needs to be consumed before the connection can be re-used, but many applications (like this one) don't bother reading the body of an error message. Since Java 7, however, if the error body is small enough, it will be consumed and buffered automatically.
Behind the scenes, a connection cache is used to retain open connections. Although the method names suggest a new connection is opened every time, in fact the cache is first checked for an open connection.
How can I detect that the client side of a tomcat servlet request has disconnected? I've read that I should do a response.getOutputStream().print(), then a response.getOutputStream().flush() and catch an IOException, but is there a way I can detect this without writing any data?
EDIT:
The servlet sends out a data stream that doesn't necessarily end, but doesn't necessarily have any data flowing through it (it's a stream of real time events). I need to actually detect when the client disconnects because I have some cleanup I have to do at that point (resources to release, etcetera). If I have the HttpServletRequest available, will trying to read from that throw an IOException if the client disconnects?
is there a way I can detect this
without writing any data?
No because there isn't a way in TCP/IP to detect it without writing any data.
Don't worry about it. Just complete the request actions and write the response. If the client has disappeared, that will cause an IOException: connection reset, which will be thrown into the servlet container. Nothing you have to do about that.
I need to actually detect when the client disconnects because I have some cleanup I have to do at that point (resources to release, etcetera).
There the finally block is for. It will be executed regardless of the outcome. E.g.
OutputStream output = null;
try {
output = response.getOutputStream();
// ...
output.flush();
// ...
} finally {
// Do your cleanup here.
}
If I have the HttpServletRequest available, will trying to read from that throw an IOException if the client disconnects?
Depends on how you're reading from it and how much of request body is already in server memory. In case of normal form encoded requests, whenever you call getParameter() beforehand, it will usually be fully parsed and stored in server memory. Calling the getInputStream() won't be useful at all. Better do it on the response instead.
Have you tried to flush the buffer of the response:
response.flushBuffer();
Seems to throw an IOException when the client disconnected.
How can I detect that the client side of a tomcat servlet request has disconnected? I've read that I should do a response.getOutputStream().print(), then a response.getOutputStream().flush() and catch an IOException, but is there a way I can detect this without writing any data?
EDIT:
The servlet sends out a data stream that doesn't necessarily end, but doesn't necessarily have any data flowing through it (it's a stream of real time events). I need to actually detect when the client disconnects because I have some cleanup I have to do at that point (resources to release, etcetera). If I have the HttpServletRequest available, will trying to read from that throw an IOException if the client disconnects?
is there a way I can detect this
without writing any data?
No because there isn't a way in TCP/IP to detect it without writing any data.
Don't worry about it. Just complete the request actions and write the response. If the client has disappeared, that will cause an IOException: connection reset, which will be thrown into the servlet container. Nothing you have to do about that.
I need to actually detect when the client disconnects because I have some cleanup I have to do at that point (resources to release, etcetera).
There the finally block is for. It will be executed regardless of the outcome. E.g.
OutputStream output = null;
try {
output = response.getOutputStream();
// ...
output.flush();
// ...
} finally {
// Do your cleanup here.
}
If I have the HttpServletRequest available, will trying to read from that throw an IOException if the client disconnects?
Depends on how you're reading from it and how much of request body is already in server memory. In case of normal form encoded requests, whenever you call getParameter() beforehand, it will usually be fully parsed and stored in server memory. Calling the getInputStream() won't be useful at all. Better do it on the response instead.
Have you tried to flush the buffer of the response:
response.flushBuffer();
Seems to throw an IOException when the client disconnected.
How can I detect that the client side of a tomcat servlet request has disconnected? I've read that I should do a response.getOutputStream().print(), then a response.getOutputStream().flush() and catch an IOException, but is there a way I can detect this without writing any data?
EDIT:
The servlet sends out a data stream that doesn't necessarily end, but doesn't necessarily have any data flowing through it (it's a stream of real time events). I need to actually detect when the client disconnects because I have some cleanup I have to do at that point (resources to release, etcetera). If I have the HttpServletRequest available, will trying to read from that throw an IOException if the client disconnects?
is there a way I can detect this
without writing any data?
No because there isn't a way in TCP/IP to detect it without writing any data.
Don't worry about it. Just complete the request actions and write the response. If the client has disappeared, that will cause an IOException: connection reset, which will be thrown into the servlet container. Nothing you have to do about that.
I need to actually detect when the client disconnects because I have some cleanup I have to do at that point (resources to release, etcetera).
There the finally block is for. It will be executed regardless of the outcome. E.g.
OutputStream output = null;
try {
output = response.getOutputStream();
// ...
output.flush();
// ...
} finally {
// Do your cleanup here.
}
If I have the HttpServletRequest available, will trying to read from that throw an IOException if the client disconnects?
Depends on how you're reading from it and how much of request body is already in server memory. In case of normal form encoded requests, whenever you call getParameter() beforehand, it will usually be fully parsed and stored in server memory. Calling the getInputStream() won't be useful at all. Better do it on the response instead.
Have you tried to flush the buffer of the response:
response.flushBuffer();
Seems to throw an IOException when the client disconnected.
Trying to read some image files from a server and using socket programming for the same.
But I am getting socketTimeOut exception when there is no file exist. Not only that loosing the connection to the server.
How can I avoid loosing the connection to the server when there is no file exist.
InputStream inputStream = new BufferedInputStream(socket().getInputStream());
int i = -1;
while ((i = bufferedInputStream.read()) != -1) {
byteArrayOutputStream.write(i);
}
In the above code I am getting exception, when I call read() on bufferedInputStream. How can I handle this exception and loosing connection to server.
Thanks
You don't get that exception 'when the file [doesn't] exist'. You get it when you have set a read timeout and no data has arrived within the timeout, if your timeout is too short, raise it. If you want to wait forever, remove it. It's your timeout, you set it.
If you get a SocketTimeoutException you don't lose the connection. What makes you think you did?