ZeroMQ blocked in a context.term() call. Why? How to prevent? - java

I have a java program that using ZeroMQ.
But I found the program blocked in context.term(); if receiving a message( recvMsg() ) time out!
ZMQ.Context context = ZMQ.context(1);
ZMQ.Socket socket = context.socket(ZMQ.REQ);
socket.connect(mAddress);
ZMsg ZM = new ZMsg();
ZM.add(qString);
ZM.send(socket, true);
socket.setReceiveTimeOut(mTimeout);
ZMsg receivedZM = ZMsg.recvMsg(socket);
if(receivedZM != null) {
System.out.println(receivedZM.getFirst().toString());
}
socket.close();
context.term();
What is the reason cause it to blocked?
And how to solve this problem?

ZeroMQ is a system using many tricks behind the Context()-factory
I always advocate to automatically set .setsockopt( ZMQ_LINGER, 0 ) right upon a Socket-instantiation, right due to these types of behaviour, that otherwise remain outside of your local-code domain of control. A hanging Context-instance IO-thread(s) ( be it after a programmed .term() having been issued in spite of the not yet successful .close() of all socket-instances, instantiated under this Context-instance the .term() is to dismantle and release back all system resources from, or an unhandled exception case, when things just went straight wreck havoc ) is one of such never-more-s.
Feel free to follow schoolbook and online hacks/snippet examples, but a serious distributed system designer ought take all reasonable steps and measures so as to prevent her/his system code to fall into any deadlock-state ( the less into an un-salvageable one ).
What is the reason?
As documentation states - it is a designed-in feature of ZeroMQ:
attempting to terminate the socket's context with zmq_ctx_term() shall block until all pending messages have been sent to a peer.
Any case, where a .send()-dispatched ( just dispatched -- by no means meaning that it has already been sent-to-wire ) message is still inside the local-queue for any of the recognised ( and potentially disconnected or busy or ... ) peer-nodes, the just-default configured .term() cannot proceed and will block.
What is the solution:
Newer API versions started to say, a default LINGER value to stop being -1 == INFINITY, but as you never know, which version will your code interface with, an explicit ( manual ) call to a .setsockopt( ZMQ_LINGER, 0 ) method is a self-disciplining step and increases your team awareness on how to build reliable distributed-systems' code.
Using the try: / except: / finally: syntax-handlers is needless to be raised here. You simply always have to design with failures & collisions in mind, haven't you?

According to the API, http://api.zeromq.org/4-2:zmq-term, it will block when there's still messages to transmit. This suggests that you other machine or process, the one that will open the REP socket; isn't running.

Related

Axis2 1.5.1 connections management

HttpConnections where not being used efficiently by our code using Axis2 1.5.1 project. By setting a certain limit of max connections per host and stressing the application, responsiveness was not the good I expected according the intentional limits and sometimes connections got stucked indefinitly, so the available connections were each time less till reaching the point that none request was attended by the application.
Configuration:
MultiThreadedHttpConnectionManager connManager = new MultiThreadedHttpConnectionManager();
HttpConnectionManagerParams connectionManagerParams = connManager.getParams();
connectionManagerParams.setMaxTotalConnections(httpMaxConnections);
connectionManagerParams.setDefaultMaxConnectionsPerHost(httpMaxConnectionsPerHost);
HttpClient httpClient = new HttpClient(connManager);
ConfigurationContext axisContext;
try {
axisContext = ConfigurationContextFactory.createDefaultConfigurationContext();
} catch (Exception e) {
throw new AxisFault(e.getMessage());
}
axisContext.setProperty(HTTPConstants.CACHED_HTTP_CLIENT, httpClient);
service = new MyStub(axisContext, url);
ServiceClient serviceClient = service._getServiceClient();
serviceClient.getOptions().setProperty(HTTPConstants.CONNECTION_TIMEOUT, httpConnectionTimeout);
serviceClient.getOptions().setProperty(HTTPConstants.SO_TIMEOUT, httpReadTimeout);
serviceClient.getOptions().setProperty(HTTPConstants.REUSE_HTTP_CLIENT, Constants.VALUE_TRUE);
So, as you can see, we're defining max. connections and timeouts.
I have a workaround I will share, hoping to help somebody under hurries as I was. I'll mark my answer as the good one a few days later if there isn't any better answer from experts.
1) PoolTimeout to prevent the connections that got stucked (for any reason)
Next line helped us to prevent Axis2 to lose connections that got stucked forever:
httpClient.getParams().setParameter(HttpClientParams.CONNECTION_MANAGER_TIMEOUT, 1000L);
Let's call it PoolTimeout in this entry. Make sure it's a Long, since an Integer (or int) would raise a ClassCastException that will prevent your service to even be triggered outside your client.
The system you're developing, and that is using Axis, could be in turn a client for another system. And that other system will have for sure an specific ConnectionTimeout. So I suggest
PoolTimeout <= ConnectionTimeout
Example:
serviceClient.getOptions().setProperty(HTTPConstants.CONNECTION_TIMEOUT, httpConnectionTimeout);
httpClient.getParams().setParameter(HttpClientParams.CONNECTION_MANAGER_TIMEOUT, Long.valueOf(httpConnectionTimeout) );
2) Connections release
I was using Amila's suggestion for connection management, but actually the connections were not released as fast as in advance I expected they would be (because I prepared consciously the delay times mocked external system would respond to fit limits accordingly my tunning configuration).
So I found that next lines, in method org.apache.axis2.client.OperationClient.executeImpl(boolean), helped to mark as available the connection in the pool as soon as it's been used:
HttpMethod method = (HttpMethod) getOperationContext().getMessageContext(WSDLConstants.MESSAGE_LABEL_OUT_VALUE)
.getProperty(HTTPConstants.HTTP_METHOD);
method.releaseConnection();
That's what Axis is trying to do when calling serviceClient.cleanupTransport() but it seems the context is not correct.
Now, performance tunning is working in a predictable way, so it's in hands of our integrators to select the tunning configuration that best suits production needs.
A better answer will be highly appreciated.

JVM crash because of lock on nfs file after network outage

Following code snippet causes JVM crash: if network outage occurs after acquiring lock
while (true) {
//file shared over nfs
String filename = "/home/amit/mount/lock/aLock.txt";
RandomAccessFile file = new RandomAccessFile(filename, "rws");
System.out.println("file opened");
FileLock fileLock = file.getChannel().tryLock();
if (fileLock != null) {
System.out.println("lock acquired");
} else {
System.out.println("lock not acquired");
}
try {
//wait for 15 sec
Thread.sleep(30000);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("closing filelock");
fileLock.close();
System.out.println("closing file");
file.close();
}
Observation: JVM receives KILL(9) signal and exits with exit code 137(128+9).
Probably after network connection re-establishment something goes wrong in file-descriptor tables.
This behavior is reproducible with system call flock(2) and shell utility flock(1).
Any suggestion/work-arounds?
PS: using Oracle JDK 1.7.0_25 with NFSv4
EDIT:
This lock will be used to identify which of process is active in distributed high availability cluster.
The exit code is 137.
What I expect?
way to detect problem. close file and try to re-acquire.
Exit code 138 does NOT hint at SIGKILL - this is signal 10, which can be SIGBUS (on solaris) or SIGUSR1 (on linux). Unfortunately, you don't tell us which one you're using.
In theory, nfs should handle everything transparently - the machine crashes, reboots, and clears the locks. In practise, i've never seen this work well in NFS3, and NFS4 (which you're using) makes things even harder, as there's no separate lockd() and statd().
I'd recommend you run truss(solaris) or strace (linux) on your java process, then pull the network plug, to find out what's really going on. But to be honest, locking on NFS file systems is something people have recommended against for as long as i'm using Unix (more than 25 years by now), and i'd strongly recommend you write a small server program that handles the "who does what" thing. Let your clients connect to the server, let them send some "starting with X" and "stopping to do X" message to the server, and have the server gracefully timeout the connection if a client doesn't answer for more than, say, 5 minutes. I'm 99% sure this will take you less time than trying to fix NFS locking.
After NFS server reboots, all clients that have any active file locks start the lock reclamation procedure that lasts no longer than so-called "grace period" (just a constant). If the reclamation procedure fails during the grace period, NFS client (usually a kernel space beast) sends SIGUSR1 to a process that wasn't able to recover its locks. That's the root of your problem.
When the lock succeeds on the server side, rpc.lockd on the client system requests another daemon, rpc.statd, to monitor the NFS server that implements the lock. If the server fails and then recovers, rpc.statd will be informed. It then tries to reestablish all active locks. If the NFS server fails and recovers, and rpc.lockd is unable to reestablish a lock, it sends a signal (SIGUSR1) to the process that requested the lock.
http://menehune.opt.wfu.edu/Kokua/More_SGI/007-2478-010/sgi_html/ch07.html
You're probably wondering how to avoid this. Well, there're a couple of ways, but none is ideal:
Increase grace period. AFAIR, on linux it can be changed via /proc/fs/nfsd/nfsv4leasetime.
Make a SIGUSR1 handler in your code and do something smart there. For instance in a signal handler you could set a flag denoting that locks recovery is failed. If this flag is set your program can try to wait for a readiness of NFS server (as long as it needs) and then it can try to recover locks itself. Not very fruitful...
Do not use NFS locking ever again. If it's possible switch to zookeeper as was suggested earlier.
This behavior is reproducible with system call flock(2) and shell
utility flock(1).
Since you're able to reproduce it outside of Java, it sounds like an infrastructure issue. You didn't give too much information on your NFS server or client OS, but one thing that I've seen cause weird behavior with NFS is incorrect DNS configuration.
Check that the output from "uname -n" and "hostname" on the client match your DNS records. Check that the NFS server is resolving DNS correctly.
Like Guntram, I too advise against using NFS for this sort of thing. I would use either Hazlecast (no server, instances dynamically cluster) or ZooKeeper (need to setup a server).
With Hazlecast, you can do this to acquire an exclusive cluster-wide lock:
import com.hazelcast.core.Hazelcast;
import java.util.concurrent.locks.Lock;
Lock lock = Hazelcast.getLock(myLockedObject);
lock.lock();
try {
// do something here
} finally {
lock.unlock();
}
It also supports timeouts:
if (lock.tryLock (5000, TimeUnit.MILLISECONDS)) {
try {
// do some stuff here..
}
finally {
lock.unlock();
}
}

Selector on Android sockets behaves strangely

Prerequisites: Android 2.2 emulator.
I have a perfectly working Java code which is compiled perfectly for Android as well. But there comes the strange part. In particular, it seems that java.nio.Selector doesn't work at all.
First problem arises during connection. The following code works on Java but doesn't work on Android (see below for details).
socketChannel.configureBlocking(false);
socketChannel.connect(new InetSocketAddress(remoteAddr, getRemotePort()));
Selector selector = Selector.open();
socketChannel.register(selector, socketChannel.validOps());
// Wait for an event
int selRes = selector.select(timeout);
if (selRes == 1)
{
SelectionKey selKey = (SelectionKey)selector.selectedKeys().iterator().next();
if (selKey.isValid() && selKey.isConnectable()) {
// Get channel with connection request
boolean success = socketChannel.finishConnect();
if (!success) {
selKey.cancel();
}
}
}
I pass timeout of 30000 (msec, which is 30 sec), but select returns immediately with selres equal to 0 (on Desktop Java it's 1). Switching socket to blocking mode works fine (so addresses, ports and other stuff is ok).
Ok, I left connection to be blocking (for now). But now my Accept stopped working - Selector doesn't report incoming connections. Again, getting rid of Selector by using a blocking socket works.
So the question is -- does Selector work at all in Android or the code should be rewritten to avoid Selector and java.nio altogether?
The following code works on Java
This code has major problems on any platform.
You aren't clearing the selectedKeySet. Normally this is done by iterating over it and calling Iterator.remove(), but in this case you should call selectedKeys().clear() as you aren't doing that, although you really should be: see below.
You shouldn't register with interestOps=validOps(). You should register OP_CONNECT until finishConnect() returns true, and thereafter either OP_READ or OP_WRITE, depending on what you want to do next.
If the connection doesn't succeed, finishConnect() throws an IOException, on which you should close the channel. You aren't doing that.
If the connection hasn't finished yet, finishConnect() returns false, in which case you should just keep selecting. It doesn't make any sense to cancel the key at that point.
If selres > 1 you aren't processing any selected keys at all. The test should be if (selRes > 0), and it isn't really necessary, as iterating over the selectedKeySet will just iterate zero times; however selRes == 0 does indicate that select() timed out, which can be useful if you want to consider timeouts.
The problem has a weird solution found in seemingly unrelated bug-report in Android bug tracker. Android Emulator doesn't support IPv6 and while I don't pretend to request IPv6, it seems that by default Selector attempts to work on IPv6 stack.
Once the following lines are added, my code starts to work correctly:
java.lang.System.setProperty("java.net.preferIPv4Stack", "true");
java.lang.System.setProperty("java.net.preferIPv6Addresses", "false");

What should I do if a IOException is thrown?

I have the following 3 lines of the code:
ServerSocket listeningSocket = new ServerSocket(earPort);
Socket serverSideSocket = listeningSocket.accept();
BufferedReader in = new BufferedReader(new InputStreamReader(serverSideSocket.getInputStream()));
The compiler complains about all of these 3 lines and its complain is the same for all 3 lines: unreported exception java.io.IOException; In more details, these exception are thrown by new ServerSocket, accept() and getInputStream().
I know I need to use try ... catch .... But for that I need to know what this exceptions mean in every particular case (how should I interpret them). When they happen? I mean, not in general, but in these 3 particular cases.
You dont know IN PARTICULAR because IO Exception is also a "generic" exception that can have many causes technically. It means an unexpected issue around input / output happened, but obviously it has different causes on local hard disc than on the internet.
In general, all three items resolve around sockets. So causes are related to network issues. Possible are:
No network at all, not even localhost (would be a serious technical issue).
Port already in use, when a port number is given (new Server Socket(earPort))
Network issues - for example somseone stumbled over the cable during some stuff. Can also be a cause of bad quality, a DDOS attack etc.
Port exhaustion - no client side port available for a new connection.
Basically around this line.
The same will happen or be able to happen whenever you actually do something with the streams.
In thi scase you ahve two possible main causes:
First line: the socket is already in use (program started 2 times, same port as other program). This obviously is non-fixable normally unless the user does something.
Generic later runtime error. These can happen during normal operations.
The simplest way is to declare your calling method to throw IOException, but you need to cleanup allocated resources in finally clauses before you leave your method:
public void doSession ( ) throws IOException
{
final ServerSocket listeningSocket = new ServerSocket(earPort);
try
{
final Socket serverSideSocket = listeningSocket.accept();
try
{
final BufferedReader in =
new BufferedReader(
new InputStreamReader(
serverSideSocket.getInputStream()
)
);
}
finally
{
serverSideSocket.close( )
}
}
finally
{
listeningSocket.close( )
}
}
In general it doesn't matter exactly what caused the initial IOException because there's little your app can do to correct the situation.
However, as a general answer to your question of "what to do" You have a few options.
Try Again - May work if the problem was intermittent. Remember to supply a break condition in case it doesn't.
Try Something Else - Load the resource from a different location or via a different method.
Give Up - Throw/rethrow the exception and/or abort the action or perhaps the entire program. You may want to provide a user friendly message at this point... ;-) If your program requires the input to function then not having the input leaves you little choice but not to function.

Issues receiving in RXTX

I've been using RXTX for about a year now, without too many problems. I just started a new program to interact with a new piece of hardware, so I reused the connect() method I've used on my other projects, but I have a weird problem I've never seen before.
The Problem
The device works fine, because when I connect with HyperTerminal, I send things and receive what I expect, and Serial Port Monitor(SPM) reflects this.
However, when I run the simple HyperTerminal-clone I wrote to diagnose the problem I'm having with my main app, bytes are sent, according to SPM, but nothing is received, and my SerialPortEventListener never fires. Even when I check for available data in the main loop, reader.ready() returns false. If I ignore this check, then I get an exception, details below.
Relevant section of connect() method
// Configure and open port
port = (SerialPort) CommPortIdentifier.getPortIdentifier(name)
.open(owner,1000)
port.setSerialPortParams(baud, databits, stopbits, parity);
port.setFlowControlMode(fc_mode);
final BufferedReader br = new BufferedReader(
new InputStreamReader(
port.getInputStream(),
"US-ASCII"));
// Add listener to print received characters to screen
port.addEventListener(new SerialPortEventListener(){
public void serialEvent(SerialPortEvent ev) {
try {
System.out.println("Received: "+br.readLine());
} catch (IOException e) { e.printStackTrace(); }
}
});
port.notifyOnDataAvailable();
Exception
java.io.IOException: Underlying input stream returned zero bytes
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:268)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158)
at java.io.InputStreamReader.read(InputStreamReader.java:167)
at java.io.BufferedReader.fill(BufferedReader.java:136)
at java.io.BufferedReader.read(BufferedReader.java:157)
at <my code>
The big question (again)
I think I've eliminated all possible hardware problems, so what could be wrong with my code, or the RXTX library?
Edit: something interesting
When I open HyperTerminal after sending a bunch of commands from java that should have gotten responses, all of the responses appear immediately, as if they had been put in the buffer somewhere, but unavailable.
Edit 2: Tried something new, same results
I ran the code example found here, with the same results. No data came in, but when I switched to a new program, it came all at once.
Edit 3
The hardware is fine, and even a different computer has the same problem. I am not using any sort of USB adapter.
I've started using PortMon, too, and it's giving me some interesting results. HyperTerminal and RXTX are not using the same settings, and RXTX always polls the port, unlike HyperTerminal, but I still can't see what settings would affect this. As soon as I can isolate the configuration from the constant polling, I'll post my PortMon logs.
Edit 4
Is it possible that some sort of Windows update in the last 3 months could have caused this? It has screwed up one of my MATLAB mex-based programs once.
Edit 5
I've also noticed some things that are different between HyperTerminal, RXTX, and a separate program I found that communicates with the device (but doesn't do what I want, which is why I'm rolling my own program)
HyperTerminal - set to no flow control, but Serial Port Monitor's RTS and DTR indicators are green
Other program - not sure what settings it thinks it's using, but only SPM's RTS indicator is green
RXTX - no matter what flow control I set, only SPM's CTS and DTR indicators are on.
From Serial Port Monitor's help files (paraphrased):
the indicators display the state of the serial control lines
RTS - Request To Send
CTS - Clear To Send
DTR - Data Terminal Ready
OK, sorry it's taken me so long to come back to this question. Here's how I got things working.
Note: This method will NOT work for everyone, please read below before copy/pasting into your own code
public void connect(CommPortIdentifier portId) throws Failure {
if (portId == null)
throw new Failure("No port set");
try { port = (SerialPort) portId.open(getClass().getName(), 10000); }
catch (PortInUseException e) {
throw new Failure("Port in use by " + e.currentOwner,e); }
try {
port.setSerialPortParams(9600, SerialPort.DATABITS_8,
SerialPort.STOPBITS_1, SerialPort.PARITY_NONE);
port.setFlowControlMode(SerialPort.FLOWCONTROL_RTSCTS_IN
| SerialPort.FLOWCONTROL_RTSCTS_OUT);
} catch (UnsupportedCommOperationException e) { throw new Failure(e); }
port.setRTS(true);
// More setup
}
So, in my case, the problem was that my particular device requires RTS flow control. Other devices may require different things (CTS, XON/XOFF), so check that device's manual. By default, RXTX disables all flow control mechanisms (unlike Hypertrm or other programs). Enabling each one is a two-step process.
Once you have a SerialPort object, call the setFlowControlMode() method, and bitwise-OR ('|') the necessary SerialPort.FLOWCONTROL_ constants
Set the appropriate flow control to true or false (like I did with port.setRTS(true))
For the others with similar problems, if this doesn't work, I suggest
Using a serial port monitoring program like Serial Port Monitor and/or PortMon (both Windows) to see what is actually going on.
Emailing the RXTX developers at rxtx#qbang.org (they are very helpful)
There is a simpler solution to this problem. This is what I did:
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String line;
while (keepRunning) {
try {
while ((br.ready()) && (line = br.readLine()) != null) {
....
}
If you check that the buffer "is ready" before you read it there should be no problem.
Ok, I do realize this thread is extremely old, but none of these solutions worked for me. I had the same problem and I tried everything to fix it, to no avail. Then I did some research on what causes the problem, and, when not dealing with Serial Communication, it happens at the end of a file. So, I figured I needed to add an ending to whatever is being received by the Java Application, specifically, a line return (\n). And sure enough, it fixed the problem for me! Hopefully this helps someone new, as I'm not expecting this to help anyone already on this thread...
(might be too simple, but might as well start somewhere...)
Is the port in use? Rather than:
port = (SerialPort) CommPortIdentifier.getPortIdentifier(name)
.open(owner,1000)
what about:
CommPortIdentifier portIdentifier;
try {
portIdentifier = CommPortIdentifier.getPortIdentifier(name);
} catch (NoSuchPortException nspe) {
// handle?
}
if (portIdentifier.isCurrentlyOwned()) {
// handle?
}
port = portIdentifier.open(owner, 1000);
if (!(port instanceof SerialPort)) {
// handle?
}
Are you swallowing any exceptions?
I tried RXTX a few months ago and ran into similar problems. I suggest two things:
Create a virtual comport using com0com. Enable trace logging. Compare the logs for when you use Hyperterminal versus when you run your own program. The difference will highlight what you are doing wrong.
In my humble opinion, RXTX's design is flawed and its implementation is quite buggy (take a look at its source-code, what a mess!). I've published an alternative library at http://kenai.com/projects/jperipheral with the following caveats: It's Windows-only and there are no pre-built binaries. Both of these will change in the near future. If you are interested in trying it out send me an email using http://desktopbeautifier.com/Main/contactus and I'll send you a pre-built version.
If anyone is still getting java.io.IOException: Underlying input stream returned zero bytes after you've read your characters using br.readline() for RXTX (even when you are checking first to see if br.readline() == null), just do this simple fix with a try/catch:
String line;
while (true){
try{
line = br.readLine();
}catch(IOException e){
System.out.println("No more characters received");
break;
}
//Print the line read
if (line.length() != 0)
System.out.println(line);
}
I've done some searching and it appears that this is the best/easiest way to get around this problem.
EDIT : I take that back. I tried this and still ended up having some problems. I'd recommend working with the raw InputStream directly, and implementing your own read/readLine method using InputStream.read(). That worked for me.

Categories

Resources