We have a java service that is loaded in a native daemon process on Linux. This daemon process blocks most signals and installs its own signal handlers since this process is a generic mission critical application. This is also a multi threaded application that leverages pthreads heavily and the HotSpot JVM is loaded in one of the threads.
After upgrading to Java 7 JVM on 64bit Linux (SLES, RH), we noticed that a ServerSocket waiting for connection doesn’t get signalled when the socket is closed. As per the JavaDoc, any thread currently blocked in accept() will throw a SocketException and that way we close the listening sockets when the service shuts down. We suspected the way we handle signals in the native process since we had similar experiences years back and it turned true.
In our native process, we block signals as given below(pseudocode). We do install our own handlers using sigaction() which is not shown below.
sigset_t set;
sigfillset(&set);
sigdelset(&set, SIGTRAP);
sigdelset(&set, SIGSEGV);
/* Remove following signals as it appears to be used by JVM */
for (int s = SIGRTMIN; s <= SIGRTMAX-4; s++) {
sigdelset(&set, s);
}
if ((err = pthread_sigmask(SIG_BLOCK, &set, 0)) != 0) {
err_warn(“Unable to block signals: %d”, err);
}
/* pthread_create() for LoadJVM calls and continue.
Threads are detached and hence no join() */
/* Read current mask */
pthread_sigmask(SIG_BLOCK, 0, &set);
/* Wait on these signals */
while (bshutdown == false) {
if ((sig = sigwaitinfo(&set, &info)) == -1) {
/* something unexpected happened */
}
switch (sig) {
/* Do something */
}
}
What we found with the new JVM is, Java ServerSockets don’t get notified when they are closed if we remove SIGRTMAX-2 and SIGRTMAX-3 from the set. Currently, we add these two signals and call pthread_sigmask(SIG_UNBLOCK, &set, 0) in the thread that loads JVM to resolve the issue.
My questions are:
Does anyone know if JVM uses these signals. JavaDoc on handling
signals
doesn’t list them.
On Linux (tested on x86_64 kernel 2.6.32,
3.11.6), reading current signal mask (pthread_sigmask(SIG_UNBLOCK, 0, &set)) doesn’t return the current mask. set is just 0. Has
anyone seen this behaviour? It works fine on OSX and Solaris.
Thank you for the suggestion to use strace. Though I used strace to see how JVM is blocking signals, I didn't think of checking how the socket is closed. This is what I found..
[pid 5525] rt_sigprocmask(SIG_BLOCK, [QUIT], NULL, 8) = 0
========== Waiting for 30 sec before shutdown ==========
========== IP : 0.0.0.0, Port : 9999 ==========
[pid 5525] rt_sigaction(SIGRT_30, {0x7f8844015200, [], SA_RESTORER, 0x7f884de779f0}, NULL, 8) = 0
[pid 5525] rt_sigprocmask(SIG_UNBLOCK, [RT_30], NULL, 8) = 0
========== Shutting down ==========
[pid 5516] tgkill(5515, 5525, SIGRT_30 <unfinished ...>
[pid 5525] --- SIGRT_30 {si_signo=SIGRT_30, si_code=SI_TKILL, si_pid=5515, si_uid=1000} ---
[pid 5525] rt_sigreturn() = -1 EINTR (Interrupted system call)
[pid 5516] <... tgkill resumed> ) = 0
which means, SIGRT_30 (SIGRTMAX_2) signals the listening socket when its closed.
Related
I'm doing a little project that tries to compare (simulate) Bluetooth Legacy Advertisement with Extended Advertisement. So I have two threads (one per device) and they are doing their tasks. I run the program normally and it's not working (kinda expected that) but when I run it with debugger it works. I have only one breakpoint and it's after any meaningful operations. So that is when my question arises. Is there any difference between running it with or without debugger.
I'm using IntelliJ IDEA (newest version) and Java 1.8
Output when running normally (stopped by me):
343
Process finished with exit code -1
Output with debugger:
Connected to the target VM, address: '127.0.0.1:58359', transport: 'socket'
OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended
343
21 .. -39
[21, ... -39]
DONE
Disconnected from the target VM, address: '127.0.0.1:58359', transport: 'socket'
Process finished with exit code 0
Code fragment with breakpoint
void secondaryListen() {
if(!Simulation.World.channels[this.receivedAdvertisement.channel].empty) {
SecondaryMessage currentMessage = (SecondaryMessage) Simulation.World.channels[this.receivedAdvertisement.channel].getPayload();
for (byte b : currentMessage.content) {
System.out.print(b + " ");
this.receivedData.add(b);
}
System.out.println();
this.mode = Mode.SCAN;
if(currentMessage.lastMessage) {
this.mode = Mode.FINISHED; //BREAKPOINT ON THAT LINE
System.out.println(this.receivedData.toString());
}
}
}
Thanks for all the answers in advance!
I have a gobbler that reads the output from a Process.
There is a case where we kill the process programatically using its PID and the external Windows taskkill command.
It is a 16-Bit DOS process
We taskkill because it is a DOS 16-bit process and Process.destroyForcibly() does not work with it because it resides in the ntvdm subsystem and the best way is to get the PID and use 'taskkill /T /F' which does indeed kill it and any children.
Normally, we have no problem with our DOS 16-bit (or 32 bit) processes. This one has some file locks in place. It is especially important that we ensure it is dead to have the OS release the locks.
We close all streams before and after the kill
Prior to calling taskkill, we attempt to flush and close all streams in an executor: in,out,err. After calling taskkill, we verify that all streams are closed by re-closing them.
We call Thread.interrupt() on all gobblers after the kill
Now, after the kill success, which is confirmed in the OS as well, the gobbler is still running and it does not respond to Thread.interrupt().
We even do a last-ditch Thread.stop (gasp!)
And furthermore, we have invoked Thread.stop() on it and it still stays waiting at the read stage ...
So, it seems, we are unable to stop the std-out and std-in gobblers on our Processes streams.
We know Thread.stop() is deprecated. To be somewhat safe, we catch ThreadDeath
then clean any monitors and then rethrow ThreadDeath. However,
ThreadDeath never in fact gets thrown and the thread just keeps on
waiting on inputStream.read ..
so Thread.stop being deprecated in this case is a moot point
... because it does not do anything.
Just so no one flames me and so that I have a clean conscience,
We have removed Thread.stop() from our production code.
I am not surprised that the Thread does not interrupt since that only happens on some InputStreams and not all reads are incorruptible. But I am surprised that the Thread will not stop when Thread.stop is invoked.
Thread trace shows
A thread trace shows that both main-in and main-er (the two outputs from the process) are still running even after the streams are closed, thread is interrupted and last ditch Thread.stop is called.
The task is dead, so why care about idle blocked gobblers?
It is not that we care that the gobblers won't quit. But we hate threads running that just pile up and clog the system. This particular process is called by a webserver and then .. it could amount to several hundred idle threads in a blocking state on dead processes...
We have tried launching the process two ways with no difference ...
run(working, "cmd", "/c", "start", "/B", "/W", "/SEPARATE", "C:\\workspace\\dotest.exe");
run(working, "cmd", "/c", "C:\\workspace\\dotest.exe");
The gobbler is in a read like this:
try (final InputStream is = inputStream instanceof BufferedInputStream
? inputStream : new BufferedInputStream(inputStream, 1024 * 64);
final BufferedReader br = new BufferedReader(new InputStreamReader(is, charset))) {
String line;
while ((line = br.readLine()) != null) {
lineCount++;
lines.add(line);
if (Thread.interrupted()) {
Thread.currentThread().interrupt();
throw new InterruptedException();
}
}
eofFound = true;
}
Our destroyer calls this on the gobbler thread after the taskkill:
int timeLimit = 500;
t.interrupt();
try {
t.join(timeLimit);
if (t.isAlive()) {
t.stop();
// we knows it's deprecated but we catch ThreadDeath
// then clean any monitors and then rethrow ThreadDeath
// But ThreadDeath never in fact gets thrown and the thread
// just keeps on waiting on inputStream.read ..
logger.warn("Thread stopped because it did not interrupt within {}ms: {}", timeLimit, t);
if (t.isAlive()) {
logger.warn("But thread is still alive! {}", t);
}
}
} catch (InterruptedException ie) {
logger.info("Interrupted exception while waiting on join({}) with {}", timeLimit, t, ie);
}
This is a snippet of the log output:
59.841 [main] INFO Destroying process '5952'
04.863 [main] WARN Timeout waiting for 'Close java.io.BufferedInputStream#193932a' to finish
09.865 [main] WARN Timeout waiting for 'Close java.io.FileInputStream#159f197' to finish
09.941 [main] DEBUG Executing [taskkill, /F, /PID, 5952].
10.243 [Thread-1] DEBUG SUCCESS: The process with PID 5952 has been terminated.
10.249 [main] DEBUG java.lang.ProcessImpl#620197 stopped with exit code 0
10.638 [main] INFO Destroyed WindowsProcess(5952) forcefully in 738 ms.
11.188 [main] WARN Thread stop called because it did not interrupt within 500ms: Thread[main-in,5,main]
11.188 [main] WARN But thread is still alive! Thread[main-in,5,main]
11.689 [main] WARN Thread stop because it did not interrupt within 500ms: Thread[main-err,5,main]
11.689 [main] WARN But thread is still alive! Thread[main-err,5,main]
Note: prior to calling taskkill, the Process std-out and std-err will not close. But they are closed manually after the taskkill (not shown in log because success).
I am running a Java job under Hadoop which is crashing the JVM. I suspect this is due to some JNI code (it uses JBLAS with a multithreaded native BLAS implementation). However, while I expect the crash log to supply the "problematic frame" for debugging, instead the log looks like:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f204dd6fb27, pid=19570, tid=139776470402816
#
# JRE version: 6.0_38-b05
# Java VM: Java HotSpot(TM) 64-Bit Server VM (20.13-b02 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# # [ timer expired, abort... ]
Does the JVM have some timer for how long it will wait when producing this crash dump output? If so, is there a way to increase the time so I can get more helpful information? I don't think the timer referred to is coming from Hadoop, since I see (unhelpful) references to this error in many places which do not mention Hadoop.
Googling appears to show that the string "timer expired, abort" only shows up in these JVM error messages, so it is unlikely to come from the OS.
Edit: It looks like I am probably out of luck. From ./hotspot/src/share/vm/runtime/thread.cpp
in the OpenJDK version of the JVM source:
if (is_error_reported()) {
// A fatal error has happened, the error handler(VMError::report_and_die)
// should abort JVM after creating an error log file. However in some
// rare cases, the error handler itself might deadlock. Here we try to
// kill JVM if the fatal error handler fails to abort in 2 minutes.
//
// This code is in WatcherThread because WatcherThread wakes up
// periodically so the fatal error handler doesn't need to do anything;
// also because the WatcherThread is less likely to crash than other
// threads.
for (;;) {
if (!ShowMessageBoxOnError
&& (OnError == NULL || OnError[0] == '\0')
&& Arguments::abort_hook() == NULL) {
os::sleep(this, 2 * 60 * 1000, false);
fdStream err(defaultStream::output_fd());
err.print_raw_cr("# [ timer expired, abort... ]");
// skip atexit/vm_exit/vm_abort hooks
os::die();
}
// Wake up 5 seconds later, the fatal handler may reset OnError or
// ShowMessageBoxOnError when it is ready to abort.
os::sleep(this, 5 * 1000, false);
}
}
It appears to be hard-coded to wait two minutes. Why crash reporting for my job is taking longer than that, I don't know, but I think this question at least has been answered.
It looks like I am probably out of luck. From ./hotspot/src/share/vm/runtime/thread.cpp in the OpenJDK version of the JVM source:
if (is_error_reported()) {
// A fatal error has happened, the error handler(VMError::report_and_die)
// should abort JVM after creating an error log file. However in some
// rare cases, the error handler itself might deadlock. Here we try to
// kill JVM if the fatal error handler fails to abort in 2 minutes.
//
// This code is in WatcherThread because WatcherThread wakes up
// periodically so the fatal error handler doesn't need to do anything;
// also because the WatcherThread is less likely to crash than other
// threads.
for (;;) {
if (!ShowMessageBoxOnError
&& (OnError == NULL || OnError[0] == '\0')
&& Arguments::abort_hook() == NULL) {
os::sleep(this, 2 * 60 * 1000, false);
fdStream err(defaultStream::output_fd());
err.print_raw_cr("# [ timer expired, abort... ]");
// skip atexit/vm_exit/vm_abort hooks
os::die();
}
// Wake up 5 seconds later, the fatal handler may reset OnError or
// ShowMessageBoxOnError when it is ready to abort.
os::sleep(this, 5 * 1000, false);
}
}
It appears to be hard-coded to wait two minutes. Why crash reporting for my job is taking longer than that, I don't know, but I think this question at least has been answered.
The way around this is to specify -XX:ShowMessageBoxOnError on the command line and attach to the process with a debugger from another term.
I am doing kill -15 <PID> on my working jvm and it seems completely ignored.
The invironment is:
Linux 2.6 kernel
jdk 1.6.0_20-x86-64
There are no references to sun.misc.SignalHandler in the project. The only (quite lame) clue I have is call to AbstractApplicationContext.registerShutdownHook() in main. JVM startup args do not contain anything related to signal handling.
There is nothing in logs (DEBUG level) and nothing printed out to stdout in reaction to kill -15.
How do I find out what causes ignoring of SIGTERM?
Normally, Signals 1 (SIGHUP), 2 (SIGINT), 4 (SIGILL), 7 (SIGBUS), 8
(SIGFPE), 11 (SIGSEGV), and 15 (SIGTERM) on JVM threads cause the JVM
to shut down; therefore, an application signal handler should not
attempt to recover from these unless it no longer requires the JVM.
Since your jvm doesn't exit, you may need to check whether there is:
Any use of Runtime.addShutdownHook
Existence of the -Xrs option on JVM startup
Any use of sun.misc.SignalHandler.
Here is the AbstractApplicationContext.registerShutdownHook() in Spring source code.
public void registerShutdownHook() {
if (this.shutdownHook == null) {
// No shutdown hook registered yet.
this.shutdownHook = new Thread() {
#Override
public void run() {
doClose();
}
};
Runtime.getRuntime().addShutdownHook(this.shutdownHook);
}
}
I get this error on my UNIX server, when running my java server:
Exception in thread "Thread-0" java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:640)
at [... where ever I launch a new Thread ...]
It happens everytime I have about 600 threads running.
I have set up this variable on the server:
$> ulimit -s 128
What looks strange to me is the result of this command, which I ran when the bug occured the last time:
$> free -m
total used free shared buffers cached
Mem: 2048 338 1709 0 0 0
-/+ buffers/cache: 338 1709
Swap: 0 0 0
I launch my java server like this:
$> /usr/bin/java -server -Xss128k -Xmx500m -jar /path/to/myJar.jar
My debian version:
$> cat /etc/debian_version
5.0.8
My java version:
$> java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
My question: I have read on Internet that my program should handle something like 5000 threads or so. So what is going on, and how to fix please ?
Edit: this is the output of ulimit -a when I open a shell:
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 794624
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 100000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 794624
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
I run the script as a daemon from init.d, and this is what i run:
DAEMON=/usr/bin/java
DAEMON_ARGS="-server -Xss128k -Xmx1024m -jar /path/to/myJar.jar"
ulimit -s 128 && ulimit -n 10240 && start-stop-daemon -b --start --quiet --chuid $USER -m -p $PIDFILE --exec $DAEMON -- $DAEMON_ARGS \
|| return 2
Edit2: I have come across this stack overflow question with a java test for threads: how-many-threads-can-a-java-vm-support
public class DieLikeADog {
private static Object s = new Object();
private static int count = 0;
public static void main(String[] argv){
for(;;){
new Thread(new Runnable(){
public void run(){
synchronized(s){
count += 1;
System.err.println("New thread #"+count);
}
for(;;){
try {
Thread.sleep(100);
} catch (Exception e){
System.err.println(e);
}
}
}
}).start();
}
}
}
On my server, the program crashes after 613 threads. Now i'm certain this is not normal, and only related to my server configuration. Can anyone help please ?
Edit 3:
I have come across this article, and many others, explaining that linux can't create 1000 threads, but you guys are telling me that you can do it on your systems. I don't understand.
I have also ran this script on my server: threads_limits.c and the limit is around 620 threads.
My website is now offline and this is the worst thing that could have happened to my project.
I don't know how to recompile glibc and this stuff. It's too much work imo.
I guess I should switch to windows server. Because none of the settings proposed on this page did make any change: The limit on my system is between 600 and 620 threads, no matter the program involved.
Just got the following information: This is a limitation imposed by my host provider. This has nothing to do with programming, or linux.
The underlying operating system (Debian Linux in this case) does not allow the process to create any more threads. See here how to raise the maximum amount: Maximum number of threads per process in Linux?
I have read on Internet that my program should handle something like
5000 threads or so.
This depends on the limits set to the OS, amount of running processes etc. With correct settings you can easily reach that many threads. I'm running Ubuntu on my own computer, and I can create around 32000 threads before hitting the limit on a single Java program with all my "normal stuff" running on the background (this was done with a test program that just created threads that went to sleep immediately in an infinite loop). Naturally, that high amount of threads actually doing something would probably screech consumer hardware to a halt pretty fast.
Can you try the same command with a smaller stack size "-Xss64k" and pass on the results ?
Your JVM fails to allocate stack or some other per-thread memory. Lowering the stack size with -Xss will help increase the number of threads you can create before OOM occurs (but JVM will not let you set arbitrarily small stack size).
You can confirm this is the problem by seeing how the number of threads created change as you tweak -Xss or by running strace on your JVM (you'll almost certainly see an mmap() returning ENOMEM right before an exception is thrown).
Check also your ulimit on virtual size, i.e. ulimit -v. Increasing this limit should let you create more threads with the same stack size. Note that resident set size limit (ulimit -m) is ineffective in current Linux kernel.
Also, lowering -Xmx can help by leaving more memory for thread stacks.
I am starting to suspect that "Native Posix Thread Library" is missing.
>getconf GNU_LIBPTHREAD_VERSION
Should output something like:
NPTL 2.13
If not, the Debian installation is messed up. I am not sure how to fix that, but installing Ubuntu Server seems like a good move...
for ulimit -n 100000; (open fd:s) the following program should be able to handle 32.000 threads or so.
Try it:
package test;
import java.io.InputStream;
import java.net.ServerSocket;
import java.net.Socket;
import java.util.ArrayList;
import java.util.concurrent.Semaphore;
public class Test {
final static Semaphore ss = new Semaphore(0);
static class TT implements Runnable {
#Override
public void run() {
try {
Socket t = new Socket("localhost", 47111);
InputStream is = t.getInputStream();
for (;;) {
is.read();
}
} catch (Throwable t) {
System.err.println(Thread.currentThread().getName() + " : abort");
t.printStackTrace();
System.exit(2);
}
}
}
/**
* #param args
*/
public static void main(String[] args) {
try {
Thread t = new Thread() {
public void run() {
try {
ArrayList<Socket> sockets = new ArrayList<Socket>(50000);
ServerSocket s = new ServerSocket(47111,1500);
ss.release();
for (;;) {
Socket t = s.accept();
sockets.add(t);
}
} catch (Exception e) {
e.printStackTrace();
System.exit(1);
}
}
};
t.start();
ss.acquire();
for (int i = 0; i < 30000; i++) {
Thread tt = new Thread(new TT(), "T" + i);
tt.setDaemon(true);
tt.start();
System.out.println(tt.getName());
try {
Thread.sleep(1);
} catch (InterruptedException e) {
return;
}
}
for (;;) {
System.out.println();
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
return;
}
}
} catch (Throwable t) {
t.printStackTrace();
}
}
}
Related to the OPs self-answer, but I do not yet have the reputation to comment.
I had the identical issue when hosting Tomcat on a V-Server.
All standard means of system checks (process amount/limit, available RAM, etc) indicated a healthy system, while Tomcat crashed with variants of "out of memory / resources / GCThread exceptions".
Turns out some V-Servers have an extra configuration file that limits the amount of allowed Threads per process.
In my case (Ubuntu V -Server with Strato, Germany) this was even documented by the hoster, and the restriction can be lifted manually.
Original documentation by Strato (German) here: https://www.strato.de/faq/server/prozesse-vs-threads-bei-linux-v-servern/
tl;dr: How to fix:
-inspect thread limit per process:
systemctl show --property=DefaultTasksMax
-In my case the default was 60, which was insufficient for Tomcat. I changed it to 256:
vim /etc/systemd/system.conf
Change the value for:
DefaultTasksMax=60
to something higher, e.g. 256. (The HTTPS connector of tomcat has a default thread pool of 200, so it should be at least 200.)
Then reboot, to make the changes take effect.
Its going out of memory.
Also need to change ulimit. If your OS does not give your app enough memory -Xmx i suppose will not make any difference.
I guess the -Xmx500m is having no effect.
Try
ulimit -m 512m
with -Xmx512m