Java code and JIT compilation

Java code and JIT compilation - java

Java code is compiled to bytecode which it is portable across many platforms.
But Java also is JIT compiled which happens on the fly.
Does this mean Java is compile twice? first by us to produce the bytecode and the second by the JVM?
Thanks.

Your code may be compiled from bytecode into native code by the JVM if it's "hot enough"; and it may be so compiled several times, with the old version being discarded, depending on the runtime characteristics of your program.
The JIT is a complicated beast; in fact the Sun JVM has two JITs (-client and -server) that behave differently from each other, and some implementations even support both JITs running together (so you may have interpreted bytecode running alongside code compiled by two different JITs in your application).
I recommend you read more about Hotspot (the most common JIT since it's the Sun one) if you're really interested in this subject. You can start at Sun's page.

TMK, when you compile you are compiling for the JVM platform. Then when you run the app on the JVM on any machine, certain parts of code that are used frequently get compiled into code native to that machine for optimization.
So in short: Yes, but for a very good reason

Does this mean Java is compile twice?
first by us to produce the bytecode
and the second by the JVM? Thanks.
You could say that, once using information available in the source code (compiler), then the other in runtime (JVM/JIT) when information about the specific hardware is available, along with some profiling to decide what gets to be JIT-compiled or not.

The mechanism is
Java -> Byte Code (compiled By java compiler)
ByteCode -> native code (interpreted by JVM)

Short answer: Yes kind of.
The longest one: That is 2 different things.
The first compilation is from the source code to the bytecode often call the intermediate representation (IR) in the compilation field.
Then the VM take the bytecode and translate it back to the native code on which platform it is installed.
This is 2 completely different kind of compilation. The second is not even quite a compilation since there is no syntax checker scope analyzer... Well there is some check but not the same kind of check that you have in a compiler.

Related

Cross-compiler vs JVM

I am wondering about the purpose of JVM. If JVM was created to allow platform independent executable code, then can't a cross-compiler which is capable of producing platform independent executable code replace a JVM?
the information about cross-compiler was retrieved from: http://en.wikipedia.org/wiki/Cross_compiler

The advantage of the bytecode format and the JVM is the ability to optimize code at runtime, based on profiling data acquired during the actual run. In other words, not having statically compiled native code is a win.
A specifically interesting example of the advantages of runtime compilation are monomorphic call sites: for each place in code where an instance method is being called, the runtime keeps track exactly what object types the method is called on. In very many cases it will turn out that there is only one object type involved and the JVM will then compile that call is if it was a static method (no vtables involved). This will further allow it to inline the call and then do even more optimizations such as escape analysis, register allocation, constant folding, and much more.
In fact, your criticism could (some say, should) be turned upside-down: why does Java define bytecode at all, fixing many design decisions which could have been left up to the implementation? The modern trend is to distribute source code and have the JIT compiler work on that.

JVM is doing much more than compiling. JVM is an interpreter for byte code, which also contain JIT (just in time) compiler that compiles byte code - but depending on the context of the application the same byte code can be compiled differently based on the runtime context (it decides in the runtime how your byte code is compiled). JIT is doing lof of optimization - it is trying to compile your code in most efficient way. Cross compiler can't do (all of) this because it doesn't know how your code will be used in the runtime. This is the big advantage of JVM over cross compiler.
I haven't been using cross compiler before but I guess that the advantage of crosscompiler is that you have better control on how your code is compiled.

platform independent executable code
That's what Java bytecode is. The problem with "platform independent executable code" is that it can't be native to every platform (otherwise being platform independent would be a trivial, uninteresting property). In other words, there is no format which runs without natively on all platforms.
The JVM is, depending on your definition of the term, either the ISA which defines Java bytecode, or the component that allows Java bytecode to be run on platforms whose native format for executable code isn't Java bytecode.
Of course, there is an infinite design space for alternative platform independent executable code and the above is true for any other occupant of said space. So yes, in a sense you can replace the JVM with another thing which fulfills the same function for another platform independent executable code format.

How exactly does the JVM interpret a byte code? [duplicate]

I heard many times that Java implemments JIT(just-in-time) compilation, and its bytecodes which are portable across platforms get "interpreted" by JVM. However, I don't really know what the bytecodes are, and what the JVM actually mean in Java language architecture; I would like to know more about them.

The JVM (Java Virtual Machine) has an instruction set just like a real machine. The name given to this instruction set is Java Bytecode. It is described in the Java Virtual Machine Specification. Other languages are translated into a bytecode before execution, for example ruby and python. Java's bytecode is at a fairly low level while python's is much more high level.
Interpretation and JIT compilation are two different strategies for executing bytecode. Interpretation processes bytecodes one at a time making the changes to the virtual machine state that are encoded in each instruction. JIT compilation translates the bytecode into instructions native to the host platform that carry out equivalent operations.
Interpretation is generally quick to start but slow during execution, while JIT has more startup overhead but runs quicker afterwards. Modern JVMs use a combination of interpretation and JIT techniques to get the benefit of both. The bytecode is first interpreted while the JIT is translating it in the background. Once the JIT compilation is complete, the JVM switches to using that code instead of the interpreter. Sometimes JIT compilation can produce better results than the ahead-of-time compilation used for C and C++ because it is more dynamic. The JVM can keep track of how often code is called and what the typical paths through the code are and use this information to generate more efficient code while the program is running. The JVM can switch to this new code just like when it initially switches from the interpreter to the JIT code.
Just like there are other languages that compile to native code, like C, C++, Fortran; there are compilers for other languages that output JVM bytecode. One example is the scala language. I believe that groovy and jruby can also convert to java bytecode.

Bytecode is a step between your source code and actual machine code. The JVM is what takes the bytecode and translates it into machine code.
JIT refers to the fact that the JVM does this translation on the fly when the program is executed, rather than in a single step (like in a traditionally compiled/linked language like C or C++)
The point of bytecode is that you get better performance than a strictly interpreted language (like PHP for example) because the bytecode is already partially compiled and optimized. Also, since the bytecode doesn't need to be directly interpreted by the CPU, it doesn't need to be tied to any specific CPU architecture which makes it more portable.
The disadvantage of course is that it will generally be a bit slower than a natively compiled application since the JVM still has to do some work in translating the bytecode to machine code.

When you compile something in Java, the compiler generates bytecode. This is native code for the Java Virtual Machine. The JVM then translates the bytecode to native code for your processor/architecture, this is where the JIT happens. Without JIT, the JVM would translate the program one instruction at a time, which is very slow.

Bytecode is the JVM equivalent of machine language instructions.

jcyang already provided a link to wikipedia, but this one is a better match to your question:
Java Bytecode
The Java Compiler compiles Java Source code to class files. The class's methods are translated to Byte Code and the Java virtual machine (JVM) interpretes this byte code. A Just In Time compiler (JIT) may be used to translate the byte code to machine code to speed up execution of class methods.

JVM is a virtual machine which is used to run Java code. We can compare JVM with a compiler as without it we cannot compile Java code and make applications. JVM is nothing but a piece of code that will testify your Java code. The main task of JVM is to convert the Java code into Java bytecode and compile it. This makes Java development easy. Check out this article if you want to know more about how does Java Virtual Machine Works?

Java bytecode interpreter

i know that java programs are first compiled and a bytecode is generated which is platform independent. But my question is why is this bytecode interpreted in the next stage and not compiled even though compilation is faster than interpretation in general??

You answered your own question. Byte code is platform independent. If the compiled code was executed then it would not work on every OS. This is what C does and it is why you have to have one version for every OS.
As others have suggested, the JVM does actually compile the code using JIT. It is just not saved anywhere. Here is a nice quote to sum it up
In a bytecode-compiled system, source code is translated to an
intermediate representation known as bytecode. Bytecode is not the
machine code for any particular computer, and may be portable among
computer architectures. The bytecode may then be interpreted by, or
run on, a virtual machine. The JIT compiler reads the bytecodes in
many sections (or in full rarely) and compiles them interactively into
machine language so the program can run faster

The Java bytecode normally is compiled via Just-In-Time (JIT) compilation.
So you still end up with fully compiled native code being executed, the only difference is that this native code is generated by the JVM at runtime, rather than being statically generated at the time the source code is compiled (as would happen with C/C++).
This gives Java two big advantages:
By delaying the compilation until runtime, the bytecode remains fully portable across platforms
In some cases the JIT compiler can actually generate more optimised native code because it is able to exploit statistics gathered by examining the execution parths of the code at runtime.
The downside, of course, is that the JIT compiler needs to do it's work at application start-up, which explains why JVM applications can have a slightly long start-up time compared to natively compiled apps.

The basic premise of your question is not true. Most modern Java virtual machines do compile frequently-executed parts of the code into native machine code.
This is known as just-in-time compilation, or JIT for short.
A pretty good introduction to relevant Sun's (now Oracle's) technology can be found here.

The JVM uses Just in time compilation http://en.wikipedia.org/wiki/Just-in-time_compilation, so it's much faster than pure interpretation.

Byte code is platform independent. Once compiled into bytecode, it could run on any system.
As it says on Wikipedia,
Just-in-time compilation (JIT), also known as dynamic translation, is
a method to improve the runtime performance of computer programs.
I recommend you to read this article. Its gives the basic working of JIT compiler for Java VM.
JIT compilers alter the role of the VM a little by directly compiling
Java bytecode into native platform code, thereby relieving the VM of
its need to manually call underlying native system services. The
purpose of JIT compilers, however, isn't to allow the VM to relax. By
compiling bytecodes into native code, execution speed can be greatly
improved because the native code can be executed directly on the
underlying platform.
When JIT compiler is installed, instead of the VM calling the
underlying native operating system, it calls the JIT compiler. The JIT
compiler in turn generates native code that can be passed on to the
native operating system for execution. The primary benefit of this
arrangement is that the JIT compiler is completely transparent to
everything except the VM.

Why is java bytecode interpreted?

As far as I understand Java compiles to Java bytecode, which can then be interpreted by any machine running Java for its specific CPU. Java uses JIT to interpret the bytecode, and I know it's gotten really fast at doing so, but why doesn't/didn't the language designers just statically compile down to machine instructions once it detects the particular machine it's running on? Is the bytecode interpreted every single pass through the code?

The original design was in the premise of "compile once run anywhere". So every implementer of the virtual machine can run the bytecodes generated by a compiler.
In the book Masterminds for Programming, James Gosling explained:
James: Exactly. These days we’re
beating the really good C and C++
compilers pretty much always. When you
go to the dynamic compiler, you get
two advantages when the compiler’s
running right at the last moment. One
is you know exactly what chipset
you’re running on. So many times when
people are compiling a piece of C
code, they have to compile it to run
on kind of the generic x86
architecture. Almost none of the
binaries you get are particularly well
tuned for any of them. You download
the latest copy of Mozilla,and it’ll
run on pretty much any Intel
architecture CPU. There’s pretty much
one Linux binary. It’s pretty generic,
and it’s compiled with GCC, which is
not a very good C compiler.
When HotSpot runs, it knows exactly
what chipset you’re running on. It
knows exactly how the cache works. It
knows exactly how the memory hierarchy
works. It knows exactly how all the
pipeline interlocks work in the CPU.
It knows what instruction set
extensions this chip has got. It
optimizes for precisely what machine
you’re on. Then the other half of it
is that it actually sees the
application as it’s running. It’s able
to have statistics that know which
things are important. It’s able to
inline things that a C compiler could
never do. The kind of stuff that gets
inlined in the Java world is pretty
amazing. Then you tack onto that the
way the storage management works with
the modern garbage collectors. With a
modern garbage collector, storage
allocation is extremely fast.

Java is commonly compiled to machine instructions; that's what just-in-time (JIT) compilation is. But Sun's Java implementation by default only does that for code that is run often enough (so startup and shutdown bytecode, that is executed only once, is still interpreted to prevent JIT overhead).

Bytecode interpretation is usually "fast enough" for a lot of cases. Compiling, on the other hand, is rather expensive. If 90% of the runtime is spent in 1% of the code it's far better to just compile that 1% and leave the other 99% alone.

Static compiling can blow up on you because all the other libraries you use also need to be write-once run everywhere (i.e. byte-code), including all of their dependencies. This can lead to a chain of compilations following dependencies that can blow up on you. Compiling only the code as (while running) the runtime discovers it actually needs that section of code compiled is the general idea I think. There may be many code paths you don't actually follow, especially when libraries come into question.

Java Bytecode is interpreted because bytecodes are portable across various platforms.JVM, which is platform dependent,converts and executes bytecodes to specific instruction set of that machine whether it may be a Windows or LINUX or MAC etc...

One important difference of dynamic compiling is that it optimises the code base don how it is run. There is an option -XX:CompileThreshold= which is 10000 by default. You can decrease this so it optimises the code sooner, but if you run a complex application or benchmark, you can find that reducing this number can result in slower code. If you run a simple benchmark, you may not find it makes any difference.
One example where dynamic compiling has an advantage over static compiling is inlining "virtual" methods, esp those which can be replaced. For example, the JVM can inline up to two heavily used "virtual" methods, which may be in a separate jar compiled after the caller was compiled. The called jar(s) can even be removed from the running system e.g. OSGi and have another jar added or replace it. The replacement JAR's methods can then be inlined. This can only be achieved with dynamic compiling.

What are bytecodes and how does the JVM handle them

I heard many times that Java implemments JIT(just-in-time) compilation, and its bytecodes which are portable across platforms get "interpreted" by JVM. However, I don't really know what the bytecodes are, and what the JVM actually mean in Java language architecture; I would like to know more about them.

Bytecode is a step between your source code and actual machine code. The JVM is what takes the bytecode and translates it into machine code.
JIT refers to the fact that the JVM does this translation on the fly when the program is executed, rather than in a single step (like in a traditionally compiled/linked language like C or C++)
The point of bytecode is that you get better performance than a strictly interpreted language (like PHP for example) because the bytecode is already partially compiled and optimized. Also, since the bytecode doesn't need to be directly interpreted by the CPU, it doesn't need to be tied to any specific CPU architecture which makes it more portable.
The disadvantage of course is that it will generally be a bit slower than a natively compiled application since the JVM still has to do some work in translating the bytecode to machine code.

When you compile something in Java, the compiler generates bytecode. This is native code for the Java Virtual Machine. The JVM then translates the bytecode to native code for your processor/architecture, this is where the JIT happens. Without JIT, the JVM would translate the program one instruction at a time, which is very slow.

Bytecode is the JVM equivalent of machine language instructions.

jcyang already provided a link to wikipedia, but this one is a better match to your question:
Java Bytecode
The Java Compiler compiles Java Source code to class files. The class's methods are translated to Byte Code and the Java virtual machine (JVM) interpretes this byte code. A Just In Time compiler (JIT) may be used to translate the byte code to machine code to speed up execution of class methods.

JVM is a virtual machine which is used to run Java code. We can compare JVM with a compiler as without it we cannot compile Java code and make applications. JVM is nothing but a piece of code that will testify your Java code. The main task of JVM is to convert the Java code into Java bytecode and compile it. This makes Java development easy. Check out this article if you want to know more about how does Java Virtual Machine Works?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.