Native stack and Cache code in JVM

Native stack and Cache code in JVM - java

Java byte code is interpreted however this is not as fast as directly executing native code on the JVM’s host CPU. To improve performance the Oracle Hotspot VM looks for “hot” areas of byte code that are executed regularly and compiles these to native code. The native code is then stored in the code cache in non-heap memory. In this way the Hotspot VM tries to choose the most appropriate way to trade-off the extra time it takes to compile code verses the extra time it take to execute interpreted code.
Not all JVMs support native methods, however, those that do typically create a per thread native method stack. If a JVM has been implemented using a C-linkage model for Java Native Invocation (JNI) then the native stack will be a C stack. In this case the order of arguments and return value will be identical in the native stack to typical C program. A native method can typically (depending on the JVM implementation) call back into the JVM and invoke a Java method. Such a native to Java invocation will occur on the stack (normal Java stack); the thread will leave the native stack and create a new frame on the stack (normal Java stack).
What does it mean native as for Code cache and native stack? When I write a program on Ubuntu and run it on Eclipse, and the program runs endlesly a loop in which there is some code, will this code go to code cache? Will it be recompiled to some native language and which, when I am using Ubuntu, Oracle Java 1.7, I have Intel i5 processor.
The same question about Oracle JVM7 - is it written in C? When native stack would be used and what exactly mean native in this case, when I have Ubuntu and Oracle Java 1.7 ?

As far as I understand JVM creates a separate stack for native methods bind to Java code via JNI. It has been done so because those methods has their own stack specifications. But it all depends on JVM implementation, it also could be that there is no separate stack for Native methods.
You can take a look at this article:
http://www.artima.com/insidejvm/ed2/jvm9.html

Related

How to access JVM internal data structures using the Hotspot Dynamic Attach Mechanism?

According to the OpenJDK's website, it is possible to attach a thread to Hotspot (Dynamic Attach API) which can collect information about it. I couldn't find any material on the internet on how to obtain information about Hotspot's internal data structures such as the operand stack or the state of the bytecode interpreter(to know which bytecode is currently executing) or to retrieve the current Stack Frame etc.
Also, if this is not possible with the Dynamic Attach API, how can this be done using the Serviceability Agent? The only example I found on the internet is this gist from Github which shows how to attach to a running JVM and get the values of some fields. But how to access the aformentioned internal data structures in the JVM?

The article Creating Your Own Debugging Tools briefly describes both Dynamic Attach and Serviceability Agent.
Dynamic Attach allows to connect to a running JVM and execute one of the predefined commands like
Print stack traces
Dump heap
Query or set a VM flag
Load an agent library
etc.
Basically, standard jstack, jmap and jcmd tools cover nearly all functions provided by Dynamic Attach. This API is not for accessing internal JVM structures. I doubt it can help with your task, except for loading a custom JVM TI library.
Serviceability Agent is closer to the JVM internal structures. Indeed, it can read JVM memory and recover structures like Code Cache, Stack Frames, TLAB, Constant Pool etc.
SA javadoc is available here. There are some examples of SA-based tools in JDK sources.
However, SA does not meet your requirements either.
It is a read only interface.
It works out of the process. SA-based tools suspend the JVM process entirely and read its memory using ptrace.
It is rather slow. It's main purpose is to debug unresponsive (or dead) JVM process.
Regarding operand stack, bytecode pointer etc. These notions exist only in the interpreter. Once a method is JIT-compiled, it no longer has structures you are asking about.
The locals and operands may be allocated in CPU registers or converted to constants.
The machine code does not always map one-to-one to the bytecode.
An inlined method may not even have its own stack frame, and so on.
Executing bytecodes one by one would mean giving up JIT compilation. JVM TI SingleStep indeed works only in the interpreter. Java application may work 10-100 times slower in a purely interpreted mode.
If you want to keep performance of your debugger reasonable, processing each bytecode instruction one after another is not an option. As told before, instrumentation is the right way to go. Note that it's not necessary to intercept every single bytecode - instrumenting basic blocks should be enough.

Calling C++ library function using JNI and which process executes that C++ library

I am new to Java & JNI. This question maybe very newbe. I have C++ library and Java application which interns call the C++ function using JNI concepts.
As per my understanding, JVM loads the C++ dll/SO in JVM space before calling a native function call.
If my understanding on the JVM is correct on JNI. Can someone tell me which/who is going to execute the C++ library function which is loaded inside the JVM.
Let say for C++, there is standard dynamic linker-loader present to handling the dynamic execution part of the C++ and executes all the machine instructions.
In case of JVM loaded JNI Libs (in this case C++ libs), does JVM executes the those libs ? If so does it uses its memory to execute the native function?
Thanks in advance.

The Java language allows you to mark certain methods as native. The Java Native Interface allows you to link these Java methods to a function address in native code.
When you System.loadLibrary a library that contains native code, the JVM will do two things:
Look for specifically named functions such as Java_pkg_Cls_f_ILjava_lang_String_2 and link this to the function f in class pkg.Cls.
Call JNI_OnLoad, if it exists in the library. This can perform initialization and optionally link more native methods using registerNatives.
After this point, the native library indeed resides in the process' memory space like any other library (say, libcurl or libssl). When you actually call one of the native methods, the JVM will find the function address and use a native call instruction to jump into the function. The function will execute as part of the stack trace of that thread and will show up as such in both the JVM and native stack traces.
In more advanced cases, the library might spawn additional native threads. These work like regular threads in native code and are invisible to the JVM. If these threads need to talk to the JVM as well, the developer can attach them.

What is the role of the OS when JVM executes a Java application? And why do we need the OS?

I have done some reading on the internet and some people say that Java application is executed by the java virtual machine (JVM). The word "execute" confuses me a little bit. As I know, a non-Java application (i.e: written in C, C++...) can be executed by the Operating system. At the lower level, it means the OS will load the binary program into memory, then direct the CPU to execute instructions in memory.
So now with a JVM, what would happen? As I know, JVM (contains a run-time environment) would be called first by the OS. From that point on, the JVM will spawns one (or many) threads for the application. I wonder if the role of the OS comes into play any more? It seems to me that the JVM has "by-passed" the OS and directly instruct the CPU to execute the application. If so, why do we need the OS?
Taking a little bit further, the JVM will use its JIT to compile the application's byte codes into machine codes, then execute those machine codes. Since it is already machine codes, do we need the JVM any more? Because instead of JVM, the OS can be able to instruct the CPU to execute those machine codes. Am I making any mistake here?
I would like to learn more from people here. Please correct me if I am wrong. Thank you so much!

We need the OS for all the things a C or C++ program would. The JVM does a few more things by default, but it doesn't replace anything the OS does. The only difference might be that sometimes you have Your Code [calls the] JVM [calls the] OS, or with compiled code you can have Your Code [calls the] OS
Similarly in C++ you might have Your Code [calls the] Boost [calls the] OS.
When your program is running in native code, it doesn't need the JVM as such. This is good because the JVM knows when to "stand back" and let the application run. However, not all the program will be compiled to native code for the rest of the life of the application, so you still need it.
Its is possible to use kernel by-pass devices/drivers with JNI, but Java doesn't directly support this sort of feature.

It seems to me that the JVM has "by-passed" the OS and directly
instruct the CPU to execute the application. If so, why do we need the
OS?
All C/C++ binaries (not just the JVM) run directly on the CPU. Once running, these programs can call into more machine code provided by the operating system to do useful things like reading files, starting threads, or using the network.
The JVM translates a Java program into instructions that run on the CPU. Behind the scenes, though, Java's threads, file i/o, and network sockets (to name a few) all contain instructions that call into the code provided by the operating system for threads/files/etc. This is one of the reasons you still need the OS.
Since it is already machine codes, do we need the JVM any more?
The JVM provides features that you don't get from the JIT compiler. At the end of the day the JVM is just running a lot of machine code, but not all of that machine code comes from the JIT (or from the interpreter). Some of that machine code does garbage collection, for example. That's why you need the JVM.

The underlying base O/S still has to do almost everything for the JVM, not least:
Input / Output
Memory management
Creation of threads (if using native threads)
Time sharing - i.e. allowing more than one process to run
and lots more besides!

Well, I want to keep this simple.
How you coded in ZX Spectrum, that is in old days, when really you don't use OS (even before DOS era, in pre-PC era). You write your code, and you have to manage all. In many cases there were no compiler, so your program was interpreted.
Next, it was realized that OS is great thing and the programs became simpler. Also, compiler was in broader use. I am talking about C++, for example. In those programs if you need to call to some OS function you added needed library and makes your call. One of the drawbacks where that now, your program is OS-depended, another problem was that your programs includes OS DLL in some fixed version. If another program on same station required that DLL in different version you were in trouble.
In the early days of the JVM history no JIT compiler where in used. So, your program run in interpreted mode. Your application has no longer needed to call OS directly, instead it use JVM for all it needs. JVM instead redirect some of the application calls to the OS. Think about JVM as mediator. One of the best features of the JVM where it's universality. You where not needed to stick to the specific OS (while in practice you do need to make some minor adjustments, when you are not stick to the Java requirement while your program "occasionally" works in some specific OS, for example you use C:\ for the files or assumptions upon Thread scheduler that is happen to be true on current OS, but generally JVM is not guaranteed to be true). Programmers of the JVM develop such API that can be easy in use for Java developer in one hand and it will be possible to map to any OS system calls on another hand.
JVM provides you more that simple wrapper to the OS. It has it's own memory model (thread synchronization) for example, that has some week grantees on it's own (it was totally revised in JDK 1.5, because it was broken). It also have garbage collection, it's initialize variables to null values (int i; i will be initialize to 0). That is JVM besides being moderator to OS is acting as helper code for your own application.
At some point JIT was added. It was added to reduce overhead that JVM creates. When some assumptions holds, usually after one execution of the code, command interpretation can be compiled to the machine code (I skip phase of byte code). It is optimization, and I don't know any case where you can effected.
In JDK 1.6 another optimization where added. Now, some objects in some circumstances can be allocated at the stack and not on the heap. I don't know, may be it has some side-effects, but it is example of what JVM can do for you.
And my last remark. When you compile your code, what really happens, your program is checked to be syntactically correct and then it byte-code generated (.class file). Java language use subset of the existing byte-codes (this is how AOP was implemented, using existing byte-codes that where not part of Java language). When java program is executed these byte codes are interpreted, they are translated on-the-fly to the machine instructions. If JIT is on, than some of the execution lines can be compiled to the machine language and reused instead of on-the-fly interpretation.

Since it is already machine codes, do we need the JVM any more?
compiled java programs are not the machine codes. [javac] compile [.java] file into bytecode [.class] file.Then these bytecodes are given to JRE [Java Run-time Environment].
Now the java interpreter comes in action that interpret the bytecode into native machine code that run on the CPU.

As we know OS does not Execute any program it provide Environment for processor to Execute if we talk about Environment it allocate Memory Loading file Giving instruction to the Processor,Manage the Address
of loaded data method work of the Processor is only Executing program this thing happen in c or any procedural programming Language if we see than OS Playing a very vital Role in this overhead on OS
Because if we write a small simple program in c like Hello World which contains only one Main function when will compile it generate .exe file of more than one function which is taken from Library function so manage all thing by OS is tedious job so in JVM has given Relief to OS here the work of OS is only to
load the JVM from hard disk to RAM and make the jvm Execute and allocate space for JVM to execute java Program here Momery allocation ,Loading on Byte code file from Hard disk,Address Management ,Memory Allocation and De-allocation is Done by JVM itself so OS is free it can do other work.jvm allocate or Deallocate memory based on What ever OS has given to Execute the java Program.
if we talk about Execution JVM Contains Interpreter as well as JIT compiler which converting the Byte code into Machine code of Required Function after Execution of the method Executable code of that method is destroyed thats why we can say java does have .EXE File

JIT vs Interpreters

I couldn't find the difference between JIT and Interpreters.
Jit is intermediary to Interpreters and Compilers. During runtime, it converts byte code to machine code ( JVM or Actual Machine ?) For the next time, it takes from the cache and runs
Am I right?
Interpreters will directly execute bytecode without transforming it into machine code. Is that right?
How the real processor in our pc will understand the instruction.?
Please clear my doubts.

First thing first:
With JVM, both interpreter and compiler (the JVM compiler and not the source-code compiler like javac) produce native code (aka Machine language code for the underlying physical CPU like x86) from byte code.
What's the difference then:
The difference is in how they generate the native code, how optimized it is as well how costly the optimization is. Informally, an interpreter pretty much converts each byte-code instruction to corresponding native instruction by looking up a predefined JVM-instruction to machine instruction mapping (see below pic). Interestingly, a further speedup in execution can be achieved, if we take a section of byte-code and convert it into machine code - because considering a whole logical section often provides rooms for optimization as opposed to converting (interpreting) each line in isolation (to machine instruction). This very act of converting a section of byte-code into (presumably optimized) machine instruction is called compiling (in the current context). When the compilation is done at run-time, the compiler is called JIT compiler.
The co-relation and co-ordination:
Since Java designer went for (hardware & OS) portability, they had chosen interpreter architecture (as opposed to c style compiling, assembling, and linking). However, in order to achieve more speed up, a compiler is also optionally added to a JVM. Nonetheless, as a program goes on being interpreted (and executed in physical CPU) "hotspot"s are detected by JVM and statistics are generated. Consequently, using statistics from interpreter, those sections become candidate for compilation (optimized native code). It is in fact done on-the-fly (thus JIT compiler) and the compiled machine instructions are used subsequently (rather than being interpreted). In a natural way, JVM also caches such compiled pieces of code.
Words of caution:
These are pretty much the fundamental concepts. If an actual implementer of JVM, does it a bit different way, don't get surprised. So could be the case for VM's in other languages.
Words of caution:
Statements like "interpreter executes byte code in virtual processor", "interpreter executes byte code directly", etc. are all correct as long as you understand that in the end there is a set of machine instructions that have to run in a physical hardware.
Some Good References: [I've not done extensive search though]
[paper] Instruction Folding in a Hardware-Translation Based Java Virtual
Machine by Hitoshi Oi
[book] Computer organization and design, 4th ed, D. A. Patterson. (see Fig 2.23)
[web-article] JVM performance optimization, Part 2: Compilers, by Eva Andreasson (JavaWorld)
PS: I've used following terms interchangebly - 'native code', 'machine language code', 'machine instructions', etc.

Interpreter: Reads your source code or some intermediate representation (bytecode) of it, and executes it directly.
JIT compiler: Reads your source code, or more typically some intermediate representation (bytecode) of it, compiles that on the fly and executes native code.

Jit is intermediary to Interpreters and Compilers. During runtime, it converts byte code to machine code ( JVM or Actual Machine ?) For the next time, it takes from the cache and runs Am i right?
Yes you are.
Interpreters will directly execute bytecode without transforming it into machine code. Is that right?
Yes, it is.
How the real processor in our pc will understand the instruction.?
In the case of interpreters, the virtual machine executes a native JVM procedure corresponding to each instruction in byte code to produce the expected behaviour. But your code isn't actually compiled to native code, as with Jit compilers. The JVM emulates the expected behaviour for each instruction.

A JIT Compiler translates byte code into machine code and then execute the machine code.
Interpreters read your high level language (interprets it) and execute what's asked by your program. Interpreters are normally not passing through byte-code and jit compilation.
But the two worlds have melt because numerous interpreters have take the path to internal byte-compilation and jit-compilation, for a better speed of execution.

Interpreter: Interprets the bytecode if a method is called multiple times every time a new interpretation is required.
JIT: when a code is called multiple time JIT converts the bytecode in native code and executes it

I'm pretty sure that JIT turns byte code into machine code for whatever machine you're running on right as it's needed. The alternative to this is to run the byte code in a java virtual machine. I'm not sure if this the same as interpreting the code since I'm more familiar with that term being used to describe the execution of a scripting (non-compiled) language like ruby or perl.

The first time a class is referenced in JVM the JIT Execution Engine re-compiles the .class files (primary Binaries) generated by Java Compiler containing JVM Instruction Set to Binaries containing HOST system’s Instruction Set. JIT stores and reuses those recompiled binaries from Memory going forward, there by reducing interpretation time and benefits from Native code execution.
And there is another flavor which does Adaptive Optimization by identifying most reused part of the app and applying JIT only over it, there by optimizing over memory usage.
On the other hand a plain old java interpreter interprets one JVM instruction from class file at a time and calls a procedure against it.
Find a detail comparison here

How does JIT replace optimized machine code during runtime?

I'm browsing through OpenJDK sources and cannot find the place where optimized code is replaced.
I wonder how this can be done in protected mode, isn't it some kind of selfmodifing code which should be prevented by the OS?

The "JITer" allocates space in say the heap or stack and inserts assembly code into it. No, self modifying code is perfectly fine. VirtualProtect (Windows) and mmap (Unix) can map pages as executable. General purpose operating systems by default will mark executable pages as read/execute but not write, you can still typically change this at runtime.
If there was no way to modify code, there would be no way to load a dll unless it's loaded to a fixed Virutal Address and shared into each process's address space; then you'd get address space hell instead of dll hell.
I'm guessing you heard of the NX bit or DEP etc, those just protect you from executing non-executable code, which helps a bit against stack overflows and the likes.

The JIT code doesn't replace optimized machine code; it replaces loaded Java bytecode. I don't know how this is implemented in OpenJDK, but typically, the JVM loads the byte code and keeps it in some form of internal structure, usually in a class that has a virtual function or virtual functions for executing the code. When it is just-in-time compiled, the pointer to that internal structure is replaced by a pointer to a class with the same interface, where the underlying representation is native machine code instead of Java byte code, and the virtual methods are implemented such that they invoke the native code rather than interpreting the byte code. There is no modification of code, merely pointing to different places.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.