I've been wondering where interpreted byte-code of methods are stored internally in the JVM (specifically HotSpot x64). I know that methods that are JIT-ed are stored and can be accessed in the Method structure but I'm trying to understand where the JVM stores the byte-code converted to assembly instructions (I assume it stores them, otherwise there would be a lot of memory usage to interpret every invocation) as I wasn't able to find it in the internals source code.
Interpreting bytecode is not as expensive as you would think. Why would the JVM spend time generating machine code for code that runs once? Best to wait until a certain method or block reaches the JIT threshold and only then spend time enabling the tracing JIT.
The src/share/vm/interpreter subdirectory seems to be what you're after:
bytecodeInterpreter.cpp implements the actual stack machine;
bytecodes.cpp defines the shape and attributes of each opcode.
bytecodes.h declares all bytecodes.
templateTable.cpp contains machinery to map JVM opcodes to assembly.
cpu/*/vm/templateTable*.cpp contains the actual code to generate assembly snippets for the given CPU.
Related
Does Java byte code include "processor instruction information"?
DO-178C Table A-6 "Testing of Outputs of Integration Process" states that the “Executable Object Code shall...”, where DO-178 defines object code as the following: “A low-level representation of the computer program not usually in a form directly usable by the target computer but in a form which includes relocation information in addition to the processor instruction information.”
Thus, I'm curious if Java bytecode would fit the DO-178C definition of "object code".
I'm not asking, as has been asked numerous times, the difference between byte code and object - I'm specifically interested in if the Java bytecode contains "processor instruction information".
Thanks a ton for your time and any feedback and insights.
According to Oracle: "JIT compilation of the byte code into native machine code has to occur before a method executes" http://docs.oracle.com/cd/E15289_01/doc.40/e15058/underst_jit.htm I guess that means the native machine processor instruction was lacking prior to this point. Based on this it seems that "no" the Java bytecode does not include the native machine processor instructions which are present in object code that comes out of a C compiler.
Moreover, Wikipedia (as much as it can be trusted) states: "Bytecode is not the machine code for any particular computer" https://en.wikipedia.org/wiki/Just-in-time_compilation Thus again, this seems to indicate that Java bytecode lacks the "processor instruction information" that is present within C object code.
This is a question of definitions over technical properties, but the answer would be yes. To begin, there are specialized processors that are designed at the gate level to parse and execute JVM bytecode (with some constraints). Even if the bytecode is not run on a physical processor but rather a JVM, the bytecode is the set of instructions for the JVM itself. However, this bytecode may later be converted to processor instructions run natively on the physical processor in use by way of JIT compilation/optimization.
Yes, the bytecode is the processor instruction information.
The platform specific instructions aren't part of the bytecode. The JVM goes through the .class file and does different things depending on what bytecode instruction it is currently looking at (it is acting as a virtual CPU, hence the terminology of virtual machine). This is of course a simplification but you can think of the JVM as a massive switch statement. Some JVMs will analyse several bytecode instructions (an entire method perhaps) to produce some platform specific machine code that is executed by the CPU directly when needed (JIT Compilation).
I have come across a few references regarding the JVM/JIT activity where there appears to be a distinction made between compiling bytecode and interpreting bytecode. The particular comment stated bytecode is interpreted for the first 10000 runs and compiled thereafter.
What is the difference between "compiling" and "interpreting" bytecode?
Interpreting byte code basically reads the bytecode line by line, doing no optimization or anything, and parsing it and executing it in realtime. This is notably inefficient for a number of reasons, including the issue that Java bytecode isn't designed to be interpreted quickly.
When a method is compiled, the JIT loads the whole method and generates native code to run directly on the CPU, rather than reading and interpreting the byte code line by line. After the method is compiled once, the generated native code gets used directly every time the method is called. This is astronomically faster, but incurs some overhead when the method gets compiled; among other things, the JVM is responsible for only compiling frequently called methods, to minimize the overhead while maximizing the performance of "tight inner loop" code that gets called extremely often.
When the bytecode is interperted, it is executed through the JVM interpreter, not directly on the processor, when it is compiled, it is compiled to native machine language and executed directly on the CPU.
The JVM has the Just In Time (JIT compiler); portions of the code that are repeated enough may be compiled into the native assembler code to speed it up.
Note that the change is done in the JVM only, the class (jar/war) files are not changed and remain as bytecode.
I am wondering about the purpose of JVM. If JVM was created to allow platform independent executable code, then can't a cross-compiler which is capable of producing platform independent executable code replace a JVM?
the information about cross-compiler was retrieved from: http://en.wikipedia.org/wiki/Cross_compiler
The advantage of the bytecode format and the JVM is the ability to optimize code at runtime, based on profiling data acquired during the actual run. In other words, not having statically compiled native code is a win.
A specifically interesting example of the advantages of runtime compilation are monomorphic call sites: for each place in code where an instance method is being called, the runtime keeps track exactly what object types the method is called on. In very many cases it will turn out that there is only one object type involved and the JVM will then compile that call is if it was a static method (no vtables involved). This will further allow it to inline the call and then do even more optimizations such as escape analysis, register allocation, constant folding, and much more.
In fact, your criticism could (some say, should) be turned upside-down: why does Java define bytecode at all, fixing many design decisions which could have been left up to the implementation? The modern trend is to distribute source code and have the JIT compiler work on that.
JVM is doing much more than compiling. JVM is an interpreter for byte code, which also contain JIT (just in time) compiler that compiles byte code - but depending on the context of the application the same byte code can be compiled differently based on the runtime context (it decides in the runtime how your byte code is compiled). JIT is doing lof of optimization - it is trying to compile your code in most efficient way. Cross compiler can't do (all of) this because it doesn't know how your code will be used in the runtime. This is the big advantage of JVM over cross compiler.
I haven't been using cross compiler before but I guess that the advantage of crosscompiler is that you have better control on how your code is compiled.
platform independent executable code
That's what Java bytecode is. The problem with "platform independent executable code" is that it can't be native to every platform (otherwise being platform independent would be a trivial, uninteresting property). In other words, there is no format which runs without natively on all platforms.
The JVM is, depending on your definition of the term, either the ISA which defines Java bytecode, or the component that allows Java bytecode to be run on platforms whose native format for executable code isn't Java bytecode.
Of course, there is an infinite design space for alternative platform independent executable code and the above is true for any other occupant of said space. So yes, in a sense you can replace the JVM with another thing which fulfills the same function for another platform independent executable code format.
Does Java JIT compile the bytecode with the same optimizations at every run on the same machine?
Does it take into consideration dynamic factors like CPU usage at a given moment, or it will make the same optimizations every time regardless of temporary factors?
No, the optimizations are non-deterministic. Even if you run the exact same single-threaded, fully deterministic program, the sampler used by the JIT to determine which methods to optimize could choose a different set.
Another thing that can change the generated machine code is the actual memory locations of certain constants that are referenced by the code. The JIT can emit machine instructions that directly access these memory locations, resulting in additional differences between the machine code on different passes.
Researchers using the Jikes RVM have addressed this problem for their benchmarks by using a feature called Compiler Replay.
I couldn't find the difference between JIT and Interpreters.
Jit is intermediary to Interpreters and Compilers. During runtime, it converts byte code to machine code ( JVM or Actual Machine ?) For the next time, it takes from the cache and runs
Am I right?
Interpreters will directly execute bytecode without transforming it into machine code. Is that right?
How the real processor in our pc will understand the instruction.?
Please clear my doubts.
First thing first:
With JVM, both interpreter and compiler (the JVM compiler and not the source-code compiler like javac) produce native code (aka Machine language code for the underlying physical CPU like x86) from byte code.
What's the difference then:
The difference is in how they generate the native code, how optimized it is as well how costly the optimization is. Informally, an interpreter pretty much converts each byte-code instruction to corresponding native instruction by looking up a predefined JVM-instruction to machine instruction mapping (see below pic). Interestingly, a further speedup in execution can be achieved, if we take a section of byte-code and convert it into machine code - because considering a whole logical section often provides rooms for optimization as opposed to converting (interpreting) each line in isolation (to machine instruction). This very act of converting a section of byte-code into (presumably optimized) machine instruction is called compiling (in the current context). When the compilation is done at run-time, the compiler is called JIT compiler.
The co-relation and co-ordination:
Since Java designer went for (hardware & OS) portability, they had chosen interpreter architecture (as opposed to c style compiling, assembling, and linking). However, in order to achieve more speed up, a compiler is also optionally added to a JVM. Nonetheless, as a program goes on being interpreted (and executed in physical CPU) "hotspot"s are detected by JVM and statistics are generated. Consequently, using statistics from interpreter, those sections become candidate for compilation (optimized native code). It is in fact done on-the-fly (thus JIT compiler) and the compiled machine instructions are used subsequently (rather than being interpreted). In a natural way, JVM also caches such compiled pieces of code.
Words of caution:
These are pretty much the fundamental concepts. If an actual implementer of JVM, does it a bit different way, don't get surprised. So could be the case for VM's in other languages.
Words of caution:
Statements like "interpreter executes byte code in virtual processor", "interpreter executes byte code directly", etc. are all correct as long as you understand that in the end there is a set of machine instructions that have to run in a physical hardware.
Some Good References: [I've not done extensive search though]
[paper] Instruction Folding in a Hardware-Translation Based Java Virtual
Machine by Hitoshi Oi
[book] Computer organization and design, 4th ed, D. A. Patterson. (see Fig 2.23)
[web-article] JVM performance optimization, Part 2: Compilers, by Eva Andreasson (JavaWorld)
PS: I've used following terms interchangebly - 'native code', 'machine language code', 'machine instructions', etc.
Interpreter: Reads your source code or some intermediate representation (bytecode) of it, and executes it directly.
JIT compiler: Reads your source code, or more typically some intermediate representation (bytecode) of it, compiles that on the fly and executes native code.
Jit is intermediary to Interpreters and Compilers. During runtime, it converts byte code to machine code ( JVM or Actual Machine ?) For the next time, it takes from the cache and runs Am i right?
Yes you are.
Interpreters will directly execute bytecode without transforming it into machine code. Is that right?
Yes, it is.
How the real processor in our pc will understand the instruction.?
In the case of interpreters, the virtual machine executes a native JVM procedure corresponding to each instruction in byte code to produce the expected behaviour. But your code isn't actually compiled to native code, as with Jit compilers. The JVM emulates the expected behaviour for each instruction.
A JIT Compiler translates byte code into machine code and then execute the machine code.
Interpreters read your high level language (interprets it) and execute what's asked by your program. Interpreters are normally not passing through byte-code and jit compilation.
But the two worlds have melt because numerous interpreters have take the path to internal byte-compilation and jit-compilation, for a better speed of execution.
Interpreter: Interprets the bytecode if a method is called multiple times every time a new interpretation is required.
JIT: when a code is called multiple time JIT converts the bytecode in native code and executes it
I'm pretty sure that JIT turns byte code into machine code for whatever machine you're running on right as it's needed. The alternative to this is to run the byte code in a java virtual machine. I'm not sure if this the same as interpreting the code since I'm more familiar with that term being used to describe the execution of a scripting (non-compiled) language like ruby or perl.
The first time a class is referenced in JVM the JIT Execution Engine re-compiles the .class files (primary Binaries) generated by Java Compiler containing JVM Instruction Set to Binaries containing HOST system’s Instruction Set. JIT stores and reuses those recompiled binaries from Memory going forward, there by reducing interpretation time and benefits from Native code execution.
And there is another flavor which does Adaptive Optimization by identifying most reused part of the app and applying JIT only over it, there by optimizing over memory usage.
On the other hand a plain old java interpreter interprets one JVM instruction from class file at a time and calls a procedure against it.
Find a detail comparison here