In my Java 8 application (RHEL 6.x, Wildfly 10.1.0.Final) the first time a user prints a document, application gets stuck while getting the list of printers from the system.
Here is the stacktrace of the blocking thread :
"Thread-211" #799 daemon prio=5 os_prio=0 tid=0x00007fca543a6800 nid=0x10755 runnable [0x00007fca02820000]
java.lang.Thread.State: RUNNABLE
at sun.print.CUPSPrinter.canConnect(Native Method)
at sun.print.CUPSPrinter.isCupsRunning(CUPSPrinter.java:444)
at sun.print.UnixPrintServiceLookup.getDefaultPrintService(UnixPrintServiceLookup.java:650)
- locked <0x00000006d2c7fff8> (a sun.print.UnixPrintServiceLookup)
at sun.print.UnixPrintServiceLookup.refreshServices(UnixPrintServiceLookup.java:277)
- locked <0x00000006d2c7fff8> (a sun.print.UnixPrintServiceLookup)
at sun.print.UnixPrintServiceLookup$PrinterChangeListener.run(UnixPrintServiceLookup.java:947)
Other users trying to print documents and relatives threads are blocked by this one.
I looked at the source code of CUPSPrinter.canConnect() (native code) and at this point we try to connect to the cups server :
/*
* Checks if connection can be made to the server.
*
*/
JNIEXPORT jboolean JNICALL
Java_sun_print_CUPSPrinter_canConnect(JNIEnv *env,
jobject printObj,
jstring server,
jint port)
{
const char *serverName;
serverName = (*env)->GetStringUTFChars(env, server, NULL);
if (serverName != NULL) {
http_t *http = j2d_httpConnect(serverName, (int)port);
(*env)->ReleaseStringUTFChars(env, server, serverName);
if (http != NULL) {
j2d_httpClose(http);
return JNI_TRUE;
}
}
return JNI_FALSE;
}
In my case CUPS is on the same host listening on port 631.
I checked the logs & everything seems to be fine.
I also checked active connections for cups with netstat :
tcp 0 0 0.0.0.0:631 0.0.0.0:* LISTEN 76107/cupsd
tcp 0 0 127.0.0.1:45652 127.0.0.1:631 TIME_WAIT -
tcp 0 0 :::631 :::* LISTEN 76107/cupsd
tcp 0 0 ::1:35982 ::1:631 TIME_WAIT -
tcp 0 0 ::1:35981 ::1:631 TIME_WAIT -
tcp 0 0 ::1:35978 ::1:631 TIME_WAIT -
tcp 0 0 ::1:35979 ::1:631 TIME_WAIT -
udp 0 0 0.0.0.0:631 0.0.0.0:* 76107/cupsd
Important notes :
If I restart Cups service, thread is not deblocked. It seems to live endessly until application restarts.
I found a bug similar to this on Open JDK : https://bugs.openjdk.java.net/browse/JDK-6290446 But the workaround of setting -Dsun.java2d.print.polling=false does not work for me (the property seems to be cleared at some point for an obscure reason, so PrinterChangeListener gets instantiated and though polling is not desactivated).
I can't reproduce the problem with a test application (clone of production) on the same server
Please HELP !!
-Djava.awt.printerjob=sun.print.PSPrinterJob
Related
I have the following code to start a netty server:
Application application = ApplicationTypeFactory.getApplication(type);
resteasyDeployment = new ResteasyDeployment();
// Explicitly setting the Application should prevent scanning
resteasyDeployment.setApplication(application);
// Need to set the provider factory to the default, otherwise the
// providers we need won't be registered, such as JSON mapping
resteasyDeployment.setProviderFactory(ResteasyProviderFactory.getInstance());
netty = new NettyJaxrsServer();
netty.setHostname(HOST);
netty.setPort(port);
netty.setDeployment(resteasyDeployment);
// Some optional extra configuration
netty.setKeepAlive(true);
netty.setRootResourcePath("/");
netty.setSecurityDomain(null);
netty.setIoWorkerCount(16);
netty.setExecutorThreadCount(16);
LOGGER.info("Starting REST server on: " + System.getenv("HOSTNAME") + ", port:" + port);
// Start the server
//("Starting REST server on " + System.getenv("HOSTNAME"));
netty.start();
LOGGER.info("Started!");
When I do:
netty.stop()
It doesn't appear to close the connection:
[john#dub-001948-VM01:~/workspace/utf/atf/src/test/resources ]$ netstat -anp | grep 8888
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 ::ffff:10.0.21.88:8888 ::ffff:10.0.21.88:60654 TIME_WAIT -
tcp 0 0 ::ffff:10.0.21.88:8888 ::ffff:10.0.21.88:60630 TIME_WAIT -
tcp 0 0 ::ffff:10.0.21.88:8888 ::ffff:10.0.21.88:60629 TIME_WAIT -
tcp 0 0 ::ffff:10.0.21.88:8888 ::ffff:10.0.21.88:60637 TIME_WAIT -
tcp 0 0 ::ffff:10.0.21.88:8888 ::ffff:10.0.21.88:60640 TIME_WAIT -
even after the program exits. In other posts I have read that netty does not close client connections on a stop. How do I shut it down cleanly?
Time_Wait is one of socket state. application can not do anything about it. more information here: http://www.isi.edu/touch/pubs/infocomm99/infocomm99-web/. For a busy server, you could tune some tcp/ip parameter to alleviate its impact.
We have a java service that is loaded in a native daemon process on Linux. This daemon process blocks most signals and installs its own signal handlers since this process is a generic mission critical application. This is also a multi threaded application that leverages pthreads heavily and the HotSpot JVM is loaded in one of the threads.
After upgrading to Java 7 JVM on 64bit Linux (SLES, RH), we noticed that a ServerSocket waiting for connection doesn’t get signalled when the socket is closed. As per the JavaDoc, any thread currently blocked in accept() will throw a SocketException and that way we close the listening sockets when the service shuts down. We suspected the way we handle signals in the native process since we had similar experiences years back and it turned true.
In our native process, we block signals as given below(pseudocode). We do install our own handlers using sigaction() which is not shown below.
sigset_t set;
sigfillset(&set);
sigdelset(&set, SIGTRAP);
sigdelset(&set, SIGSEGV);
/* Remove following signals as it appears to be used by JVM */
for (int s = SIGRTMIN; s <= SIGRTMAX-4; s++) {
sigdelset(&set, s);
}
if ((err = pthread_sigmask(SIG_BLOCK, &set, 0)) != 0) {
err_warn(“Unable to block signals: %d”, err);
}
/* pthread_create() for LoadJVM calls and continue.
Threads are detached and hence no join() */
/* Read current mask */
pthread_sigmask(SIG_BLOCK, 0, &set);
/* Wait on these signals */
while (bshutdown == false) {
if ((sig = sigwaitinfo(&set, &info)) == -1) {
/* something unexpected happened */
}
switch (sig) {
/* Do something */
}
}
What we found with the new JVM is, Java ServerSockets don’t get notified when they are closed if we remove SIGRTMAX-2 and SIGRTMAX-3 from the set. Currently, we add these two signals and call pthread_sigmask(SIG_UNBLOCK, &set, 0) in the thread that loads JVM to resolve the issue.
My questions are:
Does anyone know if JVM uses these signals. JavaDoc on handling
signals
doesn’t list them.
On Linux (tested on x86_64 kernel 2.6.32,
3.11.6), reading current signal mask (pthread_sigmask(SIG_UNBLOCK, 0, &set)) doesn’t return the current mask. set is just 0. Has
anyone seen this behaviour? It works fine on OSX and Solaris.
Thank you for the suggestion to use strace. Though I used strace to see how JVM is blocking signals, I didn't think of checking how the socket is closed. This is what I found..
[pid 5525] rt_sigprocmask(SIG_BLOCK, [QUIT], NULL, 8) = 0
========== Waiting for 30 sec before shutdown ==========
========== IP : 0.0.0.0, Port : 9999 ==========
[pid 5525] rt_sigaction(SIGRT_30, {0x7f8844015200, [], SA_RESTORER, 0x7f884de779f0}, NULL, 8) = 0
[pid 5525] rt_sigprocmask(SIG_UNBLOCK, [RT_30], NULL, 8) = 0
========== Shutting down ==========
[pid 5516] tgkill(5515, 5525, SIGRT_30 <unfinished ...>
[pid 5525] --- SIGRT_30 {si_signo=SIGRT_30, si_code=SI_TKILL, si_pid=5515, si_uid=1000} ---
[pid 5525] rt_sigreturn() = -1 EINTR (Interrupted system call)
[pid 5516] <... tgkill resumed> ) = 0
which means, SIGRT_30 (SIGRTMAX_2) signals the listening socket when its closed.
I have a java app program run on centos 6.3 and tomcat 7 as the app container, currently I meet one error : java.io.socketexception Maximum number of datagram sockets reached
we use MulticastSocket class to send message. when this error happened, I check the current server UDP socket count with command: ss -s
Total: 212 (kernel 248)
TCP: 70 (estab 15, closed 44, orphaned 0, synrecv 0, timewait 40/0), ports 22
Transport Total IP IPv6
* 248 - -
RAW 0 0 0
UDP 40 40 0
TCP 26 26 0
INET 66 66 0
FRAG 0 0 0
and I also check the
ulimits -n
The default setting is 32768, seem UDP socket count not exceed max count.
Any ideas for this error?
we use MulticastSocket class to send message.
Why? You only need a MulticastSocket to receive multicasts.
Obviously you are leaking MulticastSockets. Presumably you are creating a new one per message and never closing it.
I'm new to Scala so the question may be quite simple, though I have spent some time trying to resolve it. I have a simple Scala TCP server (no actors, single thread):
import java.io._
import java.net._
object Application {
def readSocket(socket: Socket): String = {
val bufferedReader = new BufferedReader(new InputStreamReader(socket.getInputStream))
var request = ""
var line = ""
do {
line = bufferedReader.readLine()
if (line == null) {
println("Stream terminated")
return request
}
request += line + "\n"
} while (line != "")
request
}
def writeSocket(socket: Socket, string: String) {
val out: PrintWriter = new PrintWriter(new OutputStreamWriter(socket.getOutputStream))
out.println(string)
out.flush()
}
def main(args: Array[String]) {
val port = 8000
val serverSocket = new ServerSocket(port)
while (true) {
val socket = serverSocket.accept()
readSocket(socket)
writeSocket(socket, "HTTP/1.1 200 OK\r\n\r\nOK")
socket.close()
}
}
}
The server listens on localhost:8000 for incomming requests and sends HTTP response with single OK word in the body. Then I run Apache Benchmark like this:
ab -c 1000 -n 10000 http://localhost:8000/
which works nicely for the first time. The second time I start ab it hangs producing the following output in netstat -a | grep 8000:
....
tcp 0 0 localhost.localdo:43709 localhost.localdom:8000 FIN_WAIT2
tcp 0 0 localhost.localdo:43711 localhost.localdom:8000 FIN_WAIT2
tcp 0 0 localhost.localdo:43717 localhost.localdom:8000 FIN_WAIT2
tcp 0 0 localhost.localdo:43777 localhost.localdom:8000 FIN_WAIT2
tcp 0 0 localhost.localdo:43722 localhost.localdom:8000 FIN_WAIT2
tcp 0 0 localhost.localdo:43725 localhost.localdom:8000 FIN_WAIT2
tcp6 0 0 [::]:8000 [::]:* LISTEN
tcp6 83 0 localhost.localdom:8000 localhost.localdo:43724 CLOSE_WAIT
tcp6 83 0 localhost.localdom:8000 localhost.localdo:43786 CLOSE_WAIT
tcp6 1 0 localhost.localdom:8000 localhost.localdo:43679 CLOSE_WAIT
tcp6 83 0 localhost.localdom:8000 localhost.localdo:43735 CLOSE_WAIT
tcp6 83 0 localhost.localdom:8000 localhost.localdo:43757 CLOSE_WAIT
tcp6 83 0 localhost.localdom:8000 localhost.localdo:43754 CLOSE_WAIT
tcp6 83 0 localhost.localdom:8000 localhost.localdo:43723 CLOSE_WAIT
....
Since that no more requests are served by the server. One more detail: The same ab script with the same parameters works smoothly testing a simple Node.js server on the same machine. So this issue is not related to a number of opened TCP connections which I have set to be reusable with
sudo sysctl -w net.ipv4.tcp_tw_recycle=1
sudo sysctl -w net.ipv4.tcp_tw_reuse=1
Could anyone give me a clue on what I'm missing?
Edit: Termination of stream handling has been added to the code above:
if (line == null) {
println("Stream terminated")
return request
}
I'm posting the (partial) answer to my own question for those who will stumble upon the same issue one day. First, the nature of the problem lies not in the source code but in the system itself which restricts numerious connections. The problem is that the socket passed to readSocket function appears corrupted under some conditions, i.e. it can not be read and bufferedReader.readLine() either returns null on first call or hangs indefinitely. The following two steps make the code working on some machines:
Increase the number of concurrent connections to a socket with
sysctl -w net.core.somaxconn=65535
Provide the second parameter to ServerSocket constructor which will explicitly set the length of connection queue:
val maxQueue = 50000
val serverSocket = new ServerSocket(port, maxQueue)
The steps above solve the problem on EC2 m1.large instances, however I'm still getting issues on my local machine. The better way would be to use Akka for the stuff of that kind:
import akka.actor._
import java.net.InetSocketAddress
import akka.util.ByteString
class TCPServer(port: Int) extends Actor {
override def preStart {
IOManager(context.system).listen(new InetSocketAddress(port))
}
def receive = {
case IO.NewClient(server) =>
server.accept()
case IO.Read(rHandle, bytes) => {
val byteString = ByteString("HTTP/1.1 200 OK\r\n\r\nOK")
rHandle.asSocket.write(byteString)
rHandle.close()
}
}
}
object Application {
def main(args: Array[String]) {
val port = 8000
ActorSystem().actorOf(Props(new TCPServer(port)))
}
}
First, I'd suggest trying this without ab. You can do something like:
echo "I'm\nHappy\n" | nc -vv localhost 8000
Second, I'd suggest handling end-of-stream. This is where BufferedReader.readLine() returns null. The code above only checks for an empty String. After you fix this, I'd try again. Then test with ab, after everything looks good. Let us know if the problem persists.
I found out, that when I connect by debugger to the application, and starting to debug,
the connection to terracotta server is lost (?) and in the terracotta server logs next messages are appeared:
2012-03-30 13:45:06,758 [L2_L1:TCComm Main Selector Thread_R (listen
0.0.0.0:9510)] WARN com.tc.net.protocol.transport.ConnectionHealthChecker Impl. DSO Server
- 127.0.0.1:55112 might be in Long GC. GC count since last ping reply : 1 2012-03-30 13:45:27,761 [L2_L1:TCComm Main Selector Thread_R
(listen 0.0.0.0:9510)] WARN
com.tc.net.protocol.transport.ConnectionHealthChecker Impl. DSO Server
- 127.0.0.1:55112 might be in Long GC. GC count since last ping reply : 1 2012-03-30 13:45:31,761 [L2_L1:TCComm Main Selector Thread_R
(listen 0.0.0.0:9510)] WARN
com.tc.net.protocol.transport.ConnectionHealthChecker Impl. DSO Server
- 127.0.0.1:55112 might be in Long GC. GC count since last ping reply : 2
...
2012-03-30 13:46:37,768 [L2_L1:TCComm Main Selector Thread_R (listen
0.0.0.0:9510)] ERROR com.tc.net.protocol.transport.ConnectionHealthChecke rImpl. DSO Server
- 127.0.0.1:55112 might be in Long GC. GC count since last ping reply : 10. But its too long. No more retries 2012-03-30 13:46:38,768
[HealthChecker] INFO
com.tc.net.protocol.transport.ConnectionHealthCheckerImpl. DSO Server
- 127.0.0.1:55112 is DEAD 2012-03-30 13:46:38,768 [HealthChecker] ERROR com.tc.net.protocol.transport.ConnectionHealthCheckerImpl: DSO
Server - Declared connection dead
ConnectionID(1.0b1994ac80f14b7191080bdc3f38582a) idle time 45317ms
2012-03-30 13:46:38,768 [L2_L1:TCWorkerComm # 0_R] WARN
com.tc.net.protocol.transport.ServerMessageTransport -
ConnectionID(1.0b1994ac80f14b71 91080bdc3f38582a): CLOSE EVENT :
com.tc.net.core.TCConnectionJDK14#5158277: connected: false, closed:
true local=127.0.0.1:9510 remote=127.0.0 .1:55112 connect=[Fri Mar 30
13:34:22 BST 2012] idle=2001ms [207584 read, 229735 write]. STATUS :
DISCONNECTED
...
2012-03-30 13:46:38,799 [L2_L1:TCWorkerComm # 0_R] INFO
com.tc.objectserver.persistence.sleepycat.SleepycatPersistor - Deleted
client state fo r ChannelID=[1] 2012-03-30 13:46:38,801
[WorkerThread(channel_life_cycle_stage, 0)] INFO
com.tc.objectserver.handler.ChannelLifeCycleHandler - : Received tran
sport disconnect. Shutting down client ClientID[1] 2012-03-30
13:46:38,801 [WorkerThread(channel_life_cycle_stage, 0)] INFO
com.tc.objectserver.persistence.impl.TransactionStoreImpl - shutdownC
lient() : Removing txns from DB : 0
After this is happened, any operation with cache, like getWithLoader just doesn't answer, until terracotta server won't be restarted again.
Question: how can it be fixed/reconfigured? I assume, it can happen in production also (and actually sometimes happens) if for some (any) reason application will hang/staled/etc.
This is just to get you started.
TC connections betwee server and client are considered dead when the applicable HealthCheck fails. The default values for the HealthCheck assume a very stable and performant network. I recommend you familiarize yourself with the details and the calculations on
http://www.terracotta.org/documentation/3.5.2/terracotta-server-array/high-availability#85916
So typically you begin with
a) making sure your network doesn't hiccup occasionally
b) setting the TC HealthCheck values a bit higher
If the problem persists I'd recommend posting directly on the TC forums (they'll help you even if you only use the open-source edition, may take a few days to reply though.