How does JIT replace optimized machine code during runtime?

How does JIT replace optimized machine code during runtime? - java

I'm browsing through OpenJDK sources and cannot find the place where optimized code is replaced.
I wonder how this can be done in protected mode, isn't it some kind of selfmodifing code which should be prevented by the OS?

The "JITer" allocates space in say the heap or stack and inserts assembly code into it. No, self modifying code is perfectly fine. VirtualProtect (Windows) and mmap (Unix) can map pages as executable. General purpose operating systems by default will mark executable pages as read/execute but not write, you can still typically change this at runtime.
If there was no way to modify code, there would be no way to load a dll unless it's loaded to a fixed Virutal Address and shared into each process's address space; then you'd get address space hell instead of dll hell.
I'm guessing you heard of the NX bit or DEP etc, those just protect you from executing non-executable code, which helps a bit against stack overflows and the likes.

The JIT code doesn't replace optimized machine code; it replaces loaded Java bytecode. I don't know how this is implemented in OpenJDK, but typically, the JVM loads the byte code and keeps it in some form of internal structure, usually in a class that has a virtual function or virtual functions for executing the code. When it is just-in-time compiled, the pointer to that internal structure is replaced by a pointer to a class with the same interface, where the underlying representation is native machine code instead of Java byte code, and the virtual methods are implemented such that they invoke the native code rather than interpreting the byte code. There is no modification of code, merely pointing to different places.

Related

Where does the HotSpot JVM store interpreted bytecode?

I've been wondering where interpreted byte-code of methods are stored internally in the JVM (specifically HotSpot x64). I know that methods that are JIT-ed are stored and can be accessed in the Method structure but I'm trying to understand where the JVM stores the byte-code converted to assembly instructions (I assume it stores them, otherwise there would be a lot of memory usage to interpret every invocation) as I wasn't able to find it in the internals source code.

Interpreting bytecode is not as expensive as you would think. Why would the JVM spend time generating machine code for code that runs once? Best to wait until a certain method or block reaches the JIT threshold and only then spend time enabling the tracing JIT.
The src/share/vm/interpreter subdirectory seems to be what you're after:
bytecodeInterpreter.cpp implements the actual stack machine;
bytecodes.cpp defines the shape and attributes of each opcode.
bytecodes.h declares all bytecodes.
templateTable.cpp contains machinery to map JVM opcodes to assembly.
cpu/*/vm/templateTable*.cpp contains the actual code to generate assembly snippets for the given CPU.

Does Java bytecode include "processor instruction information"?

Does Java byte code include "processor instruction information"?
DO-178C Table A-6 "Testing of Outputs of Integration Process" states that the “Executable Object Code shall...”, where DO-178 defines object code as the following: “A low-level representation of the computer program not usually in a form directly usable by the target computer but in a form which includes relocation information in addition to the processor instruction information.”
Thus, I'm curious if Java bytecode would fit the DO-178C definition of "object code".
I'm not asking, as has been asked numerous times, the difference between byte code and object - I'm specifically interested in if the Java bytecode contains "processor instruction information".
Thanks a ton for your time and any feedback and insights.
According to Oracle: "JIT compilation of the byte code into native machine code has to occur before a method executes" http://docs.oracle.com/cd/E15289_01/doc.40/e15058/underst_jit.htm I guess that means the native machine processor instruction was lacking prior to this point. Based on this it seems that "no" the Java bytecode does not include the native machine processor instructions which are present in object code that comes out of a C compiler.
Moreover, Wikipedia (as much as it can be trusted) states: "Bytecode is not the machine code for any particular computer" https://en.wikipedia.org/wiki/Just-in-time_compilation Thus again, this seems to indicate that Java bytecode lacks the "processor instruction information" that is present within C object code.

This is a question of definitions over technical properties, but the answer would be yes. To begin, there are specialized processors that are designed at the gate level to parse and execute JVM bytecode (with some constraints). Even if the bytecode is not run on a physical processor but rather a JVM, the bytecode is the set of instructions for the JVM itself. However, this bytecode may later be converted to processor instructions run natively on the physical processor in use by way of JIT compilation/optimization.

Yes, the bytecode is the processor instruction information.
The platform specific instructions aren't part of the bytecode. The JVM goes through the .class file and does different things depending on what bytecode instruction it is currently looking at (it is acting as a virtual CPU, hence the terminology of virtual machine). This is of course a simplification but you can think of the JVM as a massive switch statement. Some JVMs will analyse several bytecode instructions (an entire method perhaps) to produce some platform specific machine code that is executed by the CPU directly when needed (JIT Compilation).

Cross-compiler vs JVM

I am wondering about the purpose of JVM. If JVM was created to allow platform independent executable code, then can't a cross-compiler which is capable of producing platform independent executable code replace a JVM?
the information about cross-compiler was retrieved from: http://en.wikipedia.org/wiki/Cross_compiler

The advantage of the bytecode format and the JVM is the ability to optimize code at runtime, based on profiling data acquired during the actual run. In other words, not having statically compiled native code is a win.
A specifically interesting example of the advantages of runtime compilation are monomorphic call sites: for each place in code where an instance method is being called, the runtime keeps track exactly what object types the method is called on. In very many cases it will turn out that there is only one object type involved and the JVM will then compile that call is if it was a static method (no vtables involved). This will further allow it to inline the call and then do even more optimizations such as escape analysis, register allocation, constant folding, and much more.
In fact, your criticism could (some say, should) be turned upside-down: why does Java define bytecode at all, fixing many design decisions which could have been left up to the implementation? The modern trend is to distribute source code and have the JIT compiler work on that.

JVM is doing much more than compiling. JVM is an interpreter for byte code, which also contain JIT (just in time) compiler that compiles byte code - but depending on the context of the application the same byte code can be compiled differently based on the runtime context (it decides in the runtime how your byte code is compiled). JIT is doing lof of optimization - it is trying to compile your code in most efficient way. Cross compiler can't do (all of) this because it doesn't know how your code will be used in the runtime. This is the big advantage of JVM over cross compiler.
I haven't been using cross compiler before but I guess that the advantage of crosscompiler is that you have better control on how your code is compiled.

platform independent executable code
That's what Java bytecode is. The problem with "platform independent executable code" is that it can't be native to every platform (otherwise being platform independent would be a trivial, uninteresting property). In other words, there is no format which runs without natively on all platforms.
The JVM is, depending on your definition of the term, either the ISA which defines Java bytecode, or the component that allows Java bytecode to be run on platforms whose native format for executable code isn't Java bytecode.
Of course, there is an infinite design space for alternative platform independent executable code and the above is true for any other occupant of said space. So yes, in a sense you can replace the JVM with another thing which fulfills the same function for another platform independent executable code format.

What is the role of the OS when JVM executes a Java application? And why do we need the OS?

I have done some reading on the internet and some people say that Java application is executed by the java virtual machine (JVM). The word "execute" confuses me a little bit. As I know, a non-Java application (i.e: written in C, C++...) can be executed by the Operating system. At the lower level, it means the OS will load the binary program into memory, then direct the CPU to execute instructions in memory.
So now with a JVM, what would happen? As I know, JVM (contains a run-time environment) would be called first by the OS. From that point on, the JVM will spawns one (or many) threads for the application. I wonder if the role of the OS comes into play any more? It seems to me that the JVM has "by-passed" the OS and directly instruct the CPU to execute the application. If so, why do we need the OS?
Taking a little bit further, the JVM will use its JIT to compile the application's byte codes into machine codes, then execute those machine codes. Since it is already machine codes, do we need the JVM any more? Because instead of JVM, the OS can be able to instruct the CPU to execute those machine codes. Am I making any mistake here?
I would like to learn more from people here. Please correct me if I am wrong. Thank you so much!

We need the OS for all the things a C or C++ program would. The JVM does a few more things by default, but it doesn't replace anything the OS does. The only difference might be that sometimes you have Your Code [calls the] JVM [calls the] OS, or with compiled code you can have Your Code [calls the] OS
Similarly in C++ you might have Your Code [calls the] Boost [calls the] OS.
When your program is running in native code, it doesn't need the JVM as such. This is good because the JVM knows when to "stand back" and let the application run. However, not all the program will be compiled to native code for the rest of the life of the application, so you still need it.
Its is possible to use kernel by-pass devices/drivers with JNI, but Java doesn't directly support this sort of feature.

It seems to me that the JVM has "by-passed" the OS and directly
instruct the CPU to execute the application. If so, why do we need the
OS?
All C/C++ binaries (not just the JVM) run directly on the CPU. Once running, these programs can call into more machine code provided by the operating system to do useful things like reading files, starting threads, or using the network.
The JVM translates a Java program into instructions that run on the CPU. Behind the scenes, though, Java's threads, file i/o, and network sockets (to name a few) all contain instructions that call into the code provided by the operating system for threads/files/etc. This is one of the reasons you still need the OS.
Since it is already machine codes, do we need the JVM any more?
The JVM provides features that you don't get from the JIT compiler. At the end of the day the JVM is just running a lot of machine code, but not all of that machine code comes from the JIT (or from the interpreter). Some of that machine code does garbage collection, for example. That's why you need the JVM.

The underlying base O/S still has to do almost everything for the JVM, not least:
Input / Output
Memory management
Creation of threads (if using native threads)
Time sharing - i.e. allowing more than one process to run
and lots more besides!

Well, I want to keep this simple.
How you coded in ZX Spectrum, that is in old days, when really you don't use OS (even before DOS era, in pre-PC era). You write your code, and you have to manage all. In many cases there were no compiler, so your program was interpreted.
Next, it was realized that OS is great thing and the programs became simpler. Also, compiler was in broader use. I am talking about C++, for example. In those programs if you need to call to some OS function you added needed library and makes your call. One of the drawbacks where that now, your program is OS-depended, another problem was that your programs includes OS DLL in some fixed version. If another program on same station required that DLL in different version you were in trouble.
In the early days of the JVM history no JIT compiler where in used. So, your program run in interpreted mode. Your application has no longer needed to call OS directly, instead it use JVM for all it needs. JVM instead redirect some of the application calls to the OS. Think about JVM as mediator. One of the best features of the JVM where it's universality. You where not needed to stick to the specific OS (while in practice you do need to make some minor adjustments, when you are not stick to the Java requirement while your program "occasionally" works in some specific OS, for example you use C:\ for the files or assumptions upon Thread scheduler that is happen to be true on current OS, but generally JVM is not guaranteed to be true). Programmers of the JVM develop such API that can be easy in use for Java developer in one hand and it will be possible to map to any OS system calls on another hand.
JVM provides you more that simple wrapper to the OS. It has it's own memory model (thread synchronization) for example, that has some week grantees on it's own (it was totally revised in JDK 1.5, because it was broken). It also have garbage collection, it's initialize variables to null values (int i; i will be initialize to 0). That is JVM besides being moderator to OS is acting as helper code for your own application.
At some point JIT was added. It was added to reduce overhead that JVM creates. When some assumptions holds, usually after one execution of the code, command interpretation can be compiled to the machine code (I skip phase of byte code). It is optimization, and I don't know any case where you can effected.
In JDK 1.6 another optimization where added. Now, some objects in some circumstances can be allocated at the stack and not on the heap. I don't know, may be it has some side-effects, but it is example of what JVM can do for you.
And my last remark. When you compile your code, what really happens, your program is checked to be syntactically correct and then it byte-code generated (.class file). Java language use subset of the existing byte-codes (this is how AOP was implemented, using existing byte-codes that where not part of Java language). When java program is executed these byte codes are interpreted, they are translated on-the-fly to the machine instructions. If JIT is on, than some of the execution lines can be compiled to the machine language and reused instead of on-the-fly interpretation.

Since it is already machine codes, do we need the JVM any more?
compiled java programs are not the machine codes. [javac] compile [.java] file into bytecode [.class] file.Then these bytecodes are given to JRE [Java Run-time Environment].
Now the java interpreter comes in action that interpret the bytecode into native machine code that run on the CPU.

As we know OS does not Execute any program it provide Environment for processor to Execute if we talk about Environment it allocate Memory Loading file Giving instruction to the Processor,Manage the Address
of loaded data method work of the Processor is only Executing program this thing happen in c or any procedural programming Language if we see than OS Playing a very vital Role in this overhead on OS
Because if we write a small simple program in c like Hello World which contains only one Main function when will compile it generate .exe file of more than one function which is taken from Library function so manage all thing by OS is tedious job so in JVM has given Relief to OS here the work of OS is only to
load the JVM from hard disk to RAM and make the jvm Execute and allocate space for JVM to execute java Program here Momery allocation ,Loading on Byte code file from Hard disk,Address Management ,Memory Allocation and De-allocation is Done by JVM itself so OS is free it can do other work.jvm allocate or Deallocate memory based on What ever OS has given to Execute the java Program.
if we talk about Execution JVM Contains Interpreter as well as JIT compiler which converting the Byte code into Machine code of Required Function after Execution of the method Executable code of that method is destroyed thats why we can say java does have .EXE File

JIT vs Interpreters

I couldn't find the difference between JIT and Interpreters.
Jit is intermediary to Interpreters and Compilers. During runtime, it converts byte code to machine code ( JVM or Actual Machine ?) For the next time, it takes from the cache and runs
Am I right?
Interpreters will directly execute bytecode without transforming it into machine code. Is that right?
How the real processor in our pc will understand the instruction.?
Please clear my doubts.

First thing first:
With JVM, both interpreter and compiler (the JVM compiler and not the source-code compiler like javac) produce native code (aka Machine language code for the underlying physical CPU like x86) from byte code.
What's the difference then:
The difference is in how they generate the native code, how optimized it is as well how costly the optimization is. Informally, an interpreter pretty much converts each byte-code instruction to corresponding native instruction by looking up a predefined JVM-instruction to machine instruction mapping (see below pic). Interestingly, a further speedup in execution can be achieved, if we take a section of byte-code and convert it into machine code - because considering a whole logical section often provides rooms for optimization as opposed to converting (interpreting) each line in isolation (to machine instruction). This very act of converting a section of byte-code into (presumably optimized) machine instruction is called compiling (in the current context). When the compilation is done at run-time, the compiler is called JIT compiler.
The co-relation and co-ordination:
Since Java designer went for (hardware & OS) portability, they had chosen interpreter architecture (as opposed to c style compiling, assembling, and linking). However, in order to achieve more speed up, a compiler is also optionally added to a JVM. Nonetheless, as a program goes on being interpreted (and executed in physical CPU) "hotspot"s are detected by JVM and statistics are generated. Consequently, using statistics from interpreter, those sections become candidate for compilation (optimized native code). It is in fact done on-the-fly (thus JIT compiler) and the compiled machine instructions are used subsequently (rather than being interpreted). In a natural way, JVM also caches such compiled pieces of code.
Words of caution:
These are pretty much the fundamental concepts. If an actual implementer of JVM, does it a bit different way, don't get surprised. So could be the case for VM's in other languages.
Words of caution:
Statements like "interpreter executes byte code in virtual processor", "interpreter executes byte code directly", etc. are all correct as long as you understand that in the end there is a set of machine instructions that have to run in a physical hardware.
Some Good References: [I've not done extensive search though]
[paper] Instruction Folding in a Hardware-Translation Based Java Virtual
Machine by Hitoshi Oi
[book] Computer organization and design, 4th ed, D. A. Patterson. (see Fig 2.23)
[web-article] JVM performance optimization, Part 2: Compilers, by Eva Andreasson (JavaWorld)
PS: I've used following terms interchangebly - 'native code', 'machine language code', 'machine instructions', etc.

Interpreter: Reads your source code or some intermediate representation (bytecode) of it, and executes it directly.
JIT compiler: Reads your source code, or more typically some intermediate representation (bytecode) of it, compiles that on the fly and executes native code.

Jit is intermediary to Interpreters and Compilers. During runtime, it converts byte code to machine code ( JVM or Actual Machine ?) For the next time, it takes from the cache and runs Am i right?
Yes you are.
Interpreters will directly execute bytecode without transforming it into machine code. Is that right?
Yes, it is.
How the real processor in our pc will understand the instruction.?
In the case of interpreters, the virtual machine executes a native JVM procedure corresponding to each instruction in byte code to produce the expected behaviour. But your code isn't actually compiled to native code, as with Jit compilers. The JVM emulates the expected behaviour for each instruction.

A JIT Compiler translates byte code into machine code and then execute the machine code.
Interpreters read your high level language (interprets it) and execute what's asked by your program. Interpreters are normally not passing through byte-code and jit compilation.
But the two worlds have melt because numerous interpreters have take the path to internal byte-compilation and jit-compilation, for a better speed of execution.

Interpreter: Interprets the bytecode if a method is called multiple times every time a new interpretation is required.
JIT: when a code is called multiple time JIT converts the bytecode in native code and executes it

I'm pretty sure that JIT turns byte code into machine code for whatever machine you're running on right as it's needed. The alternative to this is to run the byte code in a java virtual machine. I'm not sure if this the same as interpreting the code since I'm more familiar with that term being used to describe the execution of a scripting (non-compiled) language like ruby or perl.

The first time a class is referenced in JVM the JIT Execution Engine re-compiles the .class files (primary Binaries) generated by Java Compiler containing JVM Instruction Set to Binaries containing HOST system’s Instruction Set. JIT stores and reuses those recompiled binaries from Memory going forward, there by reducing interpretation time and benefits from Native code execution.
And there is another flavor which does Adaptive Optimization by identifying most reused part of the app and applying JIT only over it, there by optimizing over memory usage.
On the other hand a plain old java interpreter interprets one JVM instruction from class file at a time and calls a procedure against it.
Find a detail comparison here

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.