How to detect dataloss with Java sockets?

How to detect dataloss with Java sockets? - java

I have the following situation: using a "classical" Java server (using ServerSocket) I would like to detect (as rapidly as possible) when the connection with the client failed unexpectedly (ie. non-gracefully / without a FIN packet).
The way I'm simulating this is as follows:
I'm running the server on a Linux box
I connect with telnet to the box
After the connection has succeeded I add "DROP" rule in the box's firewall
What happens is that the sending blocks after ~10k of data. I don't know for how long, but I've waited more than 10 minutes on several occasions. What I've researched so far:
Socket.setSoTimeout - however this affects only reads. If there are only writes, it doesn't have an effect
Checking for errors with PrintWriter.checkError(), since PW swallows the exceptions - however it never returns true
How could I detect this error condition, or at least configure the timeout value? (either at the JVM or at the OS level)
Update: after ~20min checkError returned true on the PrintWriter (using the server JVM 1.5 on a CentOS machine). Where is this timeout value configured?

The ~20 min timeout is because of standard TCP settings in Linux. It's really not a good idea to mess with them unless you know what you're doing. I had a similar project at work, where we were testing connection loss by disconnecting the network cable and things would just hang for a long time, exactly like you're seeing. We tried messing with the following TCP settings, which made the timeout quicker, but it caused side effects in other applications where connections would be broken when they shouldn't, due to small network delays when things got busy.
net.ipv4.tcp_retries2
net.ipv4.tcp_syn_retries
If you check the man page for tcp (man tcp) you can read about what these settings mean and maybe find other settings that might apply. You can either set them directly under /proc/sys/net/ipv4 or use sysctl.conf. These two were the ones we found made the send/recv fail quicker. Try setting them both to 1 and you'll see the send call fail a lot faster. Make sure to take not of the current settings before changing them.
I will reiterate that you really shouldn't mess with these settings. They can have side effects on the OS and other applications. The best solution is like Kitson says, use a heartbeat and/or application level timeout.
Also look into how to create a non-blocking socket, so that the send call won't block like that. Although keep in mind that sending with a non-blocking socket is usually successful as long as there's room in the send buffer. That's why it takes around 10k of data before it blocks, even though you broke the connection before that.

The only sure fire way is to generate application level "checks" instead of relying on the transport level. For example, a bi-directional heartbeat message, where if either end does not get the expected message, it closes and resets the connection.

Related

Java/Android - Getting socket output stream breaks execution with no exception launched

I'm debugging the thread which manages a socket and I noticed that when I call getOutputStream() the debug breaks with no exception thrown. I've even added breakpoints to the very Socket.getOutputStream() but there is no way to get what's wrong.
The Java server correctly accepts the connection and waits for input (by checking inputStream.available()).
Both Socket and ServerSocket come from SSLSocketFactory and SSLServerSocketFactory.
I'm using Android Studio.
What am I doing wrong?
edit: I've even tried to change structure from Thread to AsyncTask but the result is the same. This is frustrating.

Debugging network connections is a bit tricky as time-outs may occur.
I am also unsure if breakpoints on non-app-code (like Socket.getOutputStream()) will really work. The SDK code in AndroidStudio may be different to the one used by your devices which mean that breakpoints (which are set to a specific line) may end up in a totally different method (if they work at all).
Therefore I would suggest to change your code and add log statements and if necessary sleep commands to slow-down the important parts.
For SSL traffic I strongly suggest to look also at the transferred data. There are apps capturing the traffic on-device that run without root permissions. Later you can then debug the traffic on the PC using Wireshark and see if the problem was caused by a communication problem between your client and the server.

Crystal Report java library, logon hangs forever

Our project uses Business Objects for reports. Our java webapps that launch reports go thruogh a web service we set up to handle the business rules of how we want to launch them. Works great...with one wrinkle.
BO appears to be massively unreliable. The thing frequently goes down or fails to come up after a nightly timed restart. Our Ops team has sort of gotten used to this as a fact of life.
But the part of that which impacts me, on the java team, is our webservice tries to log on to BO, and instead of timing our or erroring like it should, the BO java library hangs forever. Evidently it is connecting to a half-started BO, and never gives up.
Looking around the internet, it appears that others have experienced this, but none of the things I see suggests how to set a timeout on the logon process so that if it fails, the web service doesn't lock up forever (which in turn can cause our app server to become unstable).
The connection is pretty simple:
session = CrystalEnterprise.getSessionMgr().logon(boUserName, boPassword, boServerName, boSecurityType);
All I am looking for is some way to make sure that if BO is dead, my webservice doesn't die with it. A timeout...a way to reliably detect if BO is not started and healthy before trying to logon....something. Our BO "experts" don't seem to think there is anything they can do about BO's instability and they know even less about the java library.
Ideas?

The Java SDK does not detail how to define a timeout when calling logon. I can only assume that this means it falls back on a default network connection timeout.
However, if a connection is made but the SDK doesn't receive the required information (and keeps waiting for an answer), a network timeout will never be reached as this is an application issue, not a network issue.
Therefore, the only thorough solution would be to deal with the instabilities in your BusinessObjects platform (for which you should create a separate question and describe the issue in more detail).
If this is not an option, an alternative could be to launch the connection attempt in a separate thread and implement a timeout yourself, killing the thread when the predefined timeout is reached and optionally retrying the connection attempt several times.
Keep in mind though that while the initial logon might be successful, the instabilities described in your question could cause other issues (e.g. a different SDK call could remain hanging forever due to the same issue that caused your logon call to hang).
Again, the only good solution is to look at the root cause of your platform instabilities.

How to keep jdbc to postgres alive

So I've been tracking a bug for a day or two now which happens out on a remote server that I have little control over. The ins and outs of my code are, I provide a jar file to our UI team, which wraps postgres and provides storage for data that users import. The import process is very slow due to multiple reasons, one of which is that the users are importing unpredictable, large amounts of data (which we can't really cut down on). This has lead to a whole plethora of time out issues.
After some preliminary investigation, I've narrowed it down to the jdbc to the postgres database is timing out. I had a lot of trouble replicating this on my local test setup, but have finally managed to by reducing the 'socketTimeout' of the connection properties to 10s (there's more than 10s between each call made on the connection).
My question now is, what is the best way to keep this alive? I've set the 'tcpKeepAlive' to true, but this doesn't seem to have an effect, do I need to poll the connection manually or something? From what I've read, I'm assuming that polling is automatic, and is controlled by the OS. If this is true, I don't really have control of the OS settings in the run environment, what would be the best way to handle this?
I was considering testing the connection each time it is used, and if it has timed out, I will just create a new one. Would this be the correct course of action or is there a better way to keep the connection alive? I've just taken a look at this post where people are suggesting that you should open and close a connection per query:
When my app loses connection, how should I recover it?
In my situation, I have a series of sequential inserts which take place on a single thread, if a single one fails, they all fail. To achieve this I've used transactions:
m_Connection.setAutoCommit(false);
m_TransactionSave = m_Connection.setSavepoint();
// Do something
m_Connection.commit();
m_TransactionSave = null;
m_Connection.setAutoCommit(true);
If I do keep reconnecting, or use a connection pool like PGBouncer (like someone suggested in comments), how do I persist this transaction across them?

JDBC connections to PostGres can be configured with a keep-alive setting. An issue was raised against this functionality here: JDBC keep alive issue. Additionally, there's the parameter help page.
From the notes on that, you can add the following to your connection parameters for the JDBC connection:
tcpKeepAlive=true;
Reducing the socketTimeout should make things worse, not better. The socketTimeout is a measure of how long a connection should wait when it expects data to arrive, but it has not. Making that longer, not shorter would be my instinct.
Is it possible that you are using PGBouncer? That process will actively kill connections from the server side if there is no activity.
Finally, if you are running on Linux, you can change the TCP keep alive settings with: keep alive settings. I am sure something similar exists for Windows.

What is the best practice on socket programming -- do I do a close every time or leave it open?

I haven't found a clear answer on this one.
I have a client/server application in Java 7. The server and client are on seperate computers. The client has a short (1 line of 10 characters) command to issue to the server and the server responds (120 character string). This will be repeated every X seconds--where X is the rate in the configuration file. This could be as short as 1 second to Integer.MAX_VALUE seconds.
Every time that I've created a client/server application, the philosophy has been create the connection, do the business, close the connection and then do whatever else with the data. This seems to be the way things should be done--especially when using the try with resources programming.
What are the hiccups with leaving a socket connection hanging out there for X seconds? Is it really a best practice to close down and restart or is it a better practice for the socket to remain connected and just send the command every X seconds?

I think the answer depends a bit on the number of clients you expect to have.
If you will never have very many client connections open, then I'd say leave the connection open and call it good, especially if latency is an issue - even on LANs, I've seen connections take several milliseconds to initialize. If you expect hundreds or thousands of clients to connect and do this, however, I would reconnect every time. As others have said, leaving non-blocking sockets open will often mean you have a thread left running, which can take several megabytes of stack space on a per-thread basis. Do this several thousand times and you will have a big problem on most machines.
Another issue is port space. Just because the TCP/IP stack gives us 65535 total ports doesn't mean all are usable - in fact, most local firewalls will prohibit most from being used, so even if you had enough memory to run thousands of simultaneous threads, you could very likely run out of ports if you leave a lot of connections open simultaneously.

IMHO the client should open, do it's thing and then close.
on the server...
In UNIX one usually forks a process to answer the call (each call); however, on Windows one typically creates a new thread for each inbound call.

Is java.net.Socket.setSoTimeout reliable?

From the JavaDoc for setSoTimeout
Enable/disable SO_TIMEOUT with the
specified timeout, in milliseconds.
With this option set to a non-zero
timeout, a read() call on the
InputStream associated with this
Socket will block for only this amount
of time. If the timeout expires, a
java.net.SocketTimeoutException is
raised, though the Socket is still
valid. The option must be enabled
prior to entering the blocking
operation to have effect. The timeout
must be > 0. A timeout of zero is
interpreted as an infinite timeout.
From the variety of posts on the Internet I have read that SO_TIMEOUT is rather unreliable when using Socket C API ( e.g. here ).
Hence the question, is it reliable to use setSoTimeout to check for run-away sessions?
If not, what techniques can you recommend to put a time limit on socket sessions?

I don't know any relevant recent/current operating system, on which (stream) socket timeouts are not working as they are supposed to. The post you're linking to is from a rather confused poster, which is trying to set a send timeout on a datagram socket, which makes absolutely no sense. Datagrams are either sent immediately or silently discarded.

I am not aware of any modern platform OS platform whose network stack is so broken that socket timeouts don't work. But if anyone knows of a real life example, please add it as a comment!
I would not worry about this scenario unless you are actually forced to support your application on such a broken OS. I suspect that it would be a painful exercise.

The link is about SO_RCVTIMEO. The question is about Socket.setSoTimeout(). In the only platform I am aware of where the former doesn't work (some versions of Solaris), the latter is fudged up using select(), which does work. The contract of the method demands it. You don't need to worry about this unless someone actually comes up with a platform where it doesn't I've never seen one in 16 years.

Check out the connectivity classes in Java 6 nio, they include sockets now and do non-blocking operation so you can cancel an operation if you want to.
Apache htmlclient core (?) is now able to use the nio sockets, so it seems they got that concept working. That's all I know about it, though.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.