I'm running a shared library (compiled with Intel) using JNA. In rare cases the Fortran-call ends with a forrtl : severe code. Something like
forrtl: severe (408): fort: (3): Subscript #1 of the array IWU has value 0 which is less than the lower bound of 1
Is there a way to "catch" this runtime error using JNA?
There is no general mechanism for passing error codes from native code to JNA. At best, JNA will know the function failed from its return value.
However, if the native code provides a means of reading those error codes (e.g., the Windows GetLastError() function, or the global errno variable on Linux and macOS), then more specific details can be retrieved.
For an Intel Fortran "severe" error, program execution stops unless it's handled on the native side, so I don't think there's anything you can do to "catch" it in JNA unless it's already "caught" in the library you're using... in which case, you just need to properly react to the failed method call. The Fortran source can be modified to catch some runtime errors (such as in I/O, dynamic memory management or image control) and return this status. However, for this particular array bounds error of the question, this may not apply.
Related
We have an Ada shared library compiled by GnatPro 19.2 that we're calling through a JNA call.
Our application runs fine under windows. When porting it under Linux, the application crashes randomly with an Ada exception:
storage error or erroneous memory access.
Debugging with gdb (attaching the process) doesn't help much. We get various SIGSEGV, we continue, and after a while we get the storage error with no useable call stack.
Our shared library can be used with a python native call without any issues whatsoever. The issue is probably on the Java side.
Tried switching JVM (openjdk or official jdk) without luck.
Why is this? Is there a way to workaround it?
The first hint is getting a bunch of SIGSEGV when trying to attach a debugger to the application, then seeing the program resuming when continuing.
It means that the SIGSEGV signal is handled on the Java side, as confirmed in Why does java app crash in gdb but runs normally in real life?.
Java uses speculative loads. If a pointer points to addressable memory, the load succeeds. Rarely the pointer does not point to addressable memory, and the attempted load generates SIGSEGV ... which java runtime intercepts, makes the memory addressable again, and restarts the load instruction.
Now what happens, is that by default, the GNAT run-time installs a new signal handler to catch SIGSEGV and redirect to a clean Ada exception. One interesting feature of Ada exceptions is that they can print the stack trace, even without a debugger. This SIGSEGV handler redirection allows this.
But in the case of Java, since Java uses speculative loads, SIGSEGV are expected from time to time on the java side. So when the Ada shared library has been loaded & initialized, the Ada SIGSEGV handler is installed, and catches those "normal" SIGSEGV, and aborts immediately.
Note that it doesn't happen under Windows. The java runtime probably cannot use this speculative load mechanism because of Windows limitations when handling memory violation accesses.
The signal handling is done in s-intman.adb
-- Check that treatment of exception propagation here is consistent with
-- treatment of the abort signal in System.Task_Primitives.Operations.
case signo is
when SIGFPE => raise Constraint_Error;
when SIGILL => raise Program_Error;
-- when SIGSEGV => raise Storage_Error; -- commenting this line should fix it
when SIGBUS => raise Storage_Error;
when others => null;
end case;
end Notify_Exception;
Now we'd have to rebuild a new native run-time and use it instead of the default one. That is pretty tedious and error prone. That file is part of gnarl library. We'd have to rebuild the gnarl library dynamically with the proper options -gnatp -nostdinc -O2 -fPIC to create a gnatrl library substitution... and do that again when upgrading the compiler...
Fortunately, an alternate solution was provided by AdaCore:
First create a pragmas file in the .gpr project directory (let's call it no_sigsegv.adc) containing:
pragma Interrupt_State (SIGSEGV, SYSTEM);
to instruct the run-time not to install the SIGSEGV handler
Then add this to the Compiler package of the .gpr file:
package Compiler is
...
for local_configuration_pragmas use Project'Project_dir & "/no_sigsegv.adc";
and rebuild everything from scratch. Testing: not a single crash whatsoever.
I have a Java application that uses a C++ DLL via JNA. The C++ DLL is proprietary, therefore, I cannot share the code unless I can make a simplified reproducible example. It is not straight forward to make a reproducible example until I further debug.
The application crashes sporadically with the error message Java Result: -1073740940. I am running the Java application from Netbeans, although it crashes without Netbeans. Since there is no hs_err_.log, I guess crash is in the C++ layer. How can I begin debug this crash?
The "Java Result" output from Netbeans simply tells you the exit code of the java program. You could generate the same with a System.exit(-1073740940);. A successful program exits with a code of 0. Anything else is a failure that requires documentation to interpret.
You have not given us any indication what DLL you are using, so the only information we have to work with is this exit code. Converting that int to hex digits results in 0xc0000374 which you can enter into your favorite search engine and find out is a Heap Corruption Exception. Some examples are provided but in general this means you are accessing non-allocated native memory.
Without having any idea what code you're using, I would guess you're doing something wrong with native memory, invoking native functions, or incorrectly manipulating pointers or handles somewhere in your application.
You should start by looking closely at arguments to native functions. Type mapping could be a problem if the number of bytes is mismatched. Investigate any Pointer-based arguments to native functions, including ByReference arguments. Trace back in the code and find when/how these Pointers were associated with native-allocated memory. If it was never allocated, that's one possibility for the problem. If it was allocated, see if you can find a point where that memory was freed, possibly by a different native function.
The root cause of the crash was heap corruption in the C++ layer. If a random crash occurs due to heap corruption, sometimes, it is complicated to pinpoint the cause of crash because the crash can happen later, when the program tries to manipulate the corrupted memory. Hence, it is also complicated to provide an SSCCE, especially when we work on the proprietary legacy code.
How I debugged this crash:
Reproduction: Try to find a consistent use case for the crash. If the crash is random then try to figure out a set of user actions that always leads to the crash.
Assumption: Guess which feature/component contains the crash.
Validation: Make sure that crash is not happening when you disable this feature/component.
Verification: Skimm through and slice the code. Review the small piece of code.
Documentation: Write everything.
Daniel's answer was very helpful in fixing this crash!
In Java there will be a stacktrace that says StackOverflowError and the whole system won't crash, only the program.
In C I'm aware that an array index out of bounds will produce a segmentation fault. Is it the same for a stack overflow in C and there will also be a segmentation fault i.e. same error type for a similar problem?
I'm not testing a conscious infinite resursion in C to see what happens because I don't know the consequences.
Or is it sometimes something much worse and a stack overflow in C could cause an operating system failure and force you to power cycle to get back? Or even worse, cause irreversible hardware damage? How bad effects can a stack overflow mistake have?
It seems clear that the protection is better in Java than in C. Is it any better in C than in assembly / machine code or is it practically the same (lack of) protection in C as a assembly?
In C I'm aware that an array index out of bounds will produce a segmentation fault. Is it the same for a stack overflow in C and there will also be a segmentation fault i.e. same error type for a similar problem?
There's no guarantee in C that there will be a segmentation fault. The C standard says it's undefined behaviour and leave it at that. How that might manifest, it at all, is up to the implementation/platform.
Or is it sometimes something much worse and a stack overflow in C could cause an operating system failure and force you to power cycle to get back? Or even worse, cause irreversible hardware damage? How bad effects can a stack overflow mistake have?
It's pretty rare on modern Operating systems that anything untoward would happen to the system; typically, only the program would crash. Modern operating systems use various memory protection techniques.
It seems clear that the protection is better in Java than in C. Is it any better in C than in assembly / machine code or is it practically the same (lack of) protection in C as a assembly?
That's because in Java, memory is "managed". In C, it's left to the programmer; it's by design. A C compiler does generate machine code in the end; so it can't be any better or worse. Obviously,
a good compiler could detect some of these problems and warn you, which is an advantage in C compared to assembly.
Well the handling of memory failure, as any system resource failure, is basically handled by the OS, not the language itself.
Excluding some specific actions of prevention, as the stack checking, this kind of problems normally triggers an OS exception that can be handled by the language runtime.
The stack checking if enabled, normally specifying some switches on the compiler command line, instructs the compiler to insert check probes code for each stack consuming operation to verify the memory availability.
By default when for any reason, overuse of the stack or corruption, the execution try to access memory outside the bounds of allocated stack space the OS triggers a structured exception. Java as many C runtime normally handle those exception and also supply some way to pass them to the user code for eventual recovery (i.e. through signal or SEH). If no handler has been associated from the user code the control is passed to the runtime that by default will manage a controlled task shutdown (gracious shutdown).
If no handling is available, not even from runtime, the OS will shutdown the task and operate an abruptly resource relief (i.e. truncate files, close ports etc).
In any case the OS wil protect the system, unless the OS if a flaw one...
In C it is normal to register an handler that protect the code fragment that can fail. The way you can handle the exception depends on your OS (i.e. under windows you can wrap the code that can fail in an exception handler __try __except).
Each executing thread have their stack allocated during thread creation at runtime. If a stack overflow is detected during program execution (native program compiled), only your program (process) will be affected, not the OS.
This is not a C problem, at least what happens is not specified by C. C would only says that it is undefined behavior. So effect is matter of the runtime. On any reasonable OS this will produce some kind of error that will be catched and in *nixes will produce a segmentation fault to your process. Even exotic small OSes will protect itself from your faulty process. Whatever, this will never crash the OS.
Java is not better than C, they are different languages and have different runtime. Java is, by design, more secure in the sense that it will protect you against many memory problems (among others). C gives you finer control over the machine, and yes it is more or less a kind of assembly language.
Is there an easy way to get the native sizeof(int) from the Java VM running on a particular platform? The value I want is not Integer.SIZE, in particular - the size of a Java int, but rather what you'd get from sizeof(int) in C on the platform.
I need this because I'm using a particular library that reads and writes binary files, and trying to parse those files, whose interpretation depends in a particular way on the size of the machine int. I'd like it to be portable.
I get the impression that including JNA will give me that capability - but I'd rather not have to include a native library dependency (again, portability), and I don't want to play nasty games like the only solution I could come up with offhand - allocating many direct int buffers and looking at management/memory metrics before and after. That's a hack and not reliable...
A comment suggested a system property so I looked at the list of those - it turns out there is one that gives this value:
System.getProperty("sun.arch.data.model")
I started looking into JNI and from what I understand is that if a problem occurs with the loaded dll, the jvm is possible to terminate on the spot.
I.e. the process can not be protected e.g. like when catching an exception.
So if my understanding is correct, my question is if there is a standard approach/pattern for this situation when using jni.
Or to state it differently, are processes using jni designed in way to avoid these issues?
Or such problems are not expected to occur?
Thank you.
Yes, the JVM will just terminate which is one of the reasons why JNI code is really hard to debug. If you are using C++ code you can use exceptions and then map them to a Java exception which at least gives you some level of security but doesn't help with things like bad memory access etc.
From an architecture point of view I suggest to decouple you code from JNI as much as possible. Create a class / procedure structure that is entirely testable from C++/ C and let the JNI code do only all the conversion stuff. If the JVM then crashes you at least know where you have to look.
The principles are no different from any multi-threaded C application:
Always check all your input thoroughly.
Always free up temporary memory you allocated.
Make sure your functions are re-entrant.
Don't rely on undefined behaviour.
The Java virtual machine offers you no extra protection for your native code, if it fails or is leaking, your VM will fail or leak.
You can have exactly the same spectrum of error handling in a JNI library as in anything else.
You can use try/catch. If you are on Windows, you can use SEH. If you are on Linux, you can call sigaction.
Still, if you mess up and there's a SIGSEGV, your JVM is probably toast whether you try to catch that signal or not.