Java compiler/interpreter - java

Why we do we say that Java is a compiled and interpreted language?
What is the advantage of this (being compiled and interpreted)?

Java is compiled to an intermediate "byte code" at compilation time. This is in contrast to a language like C that is compiled to machine language at compilation time. The Java byte code cannot be directly executed on hardware the way that compiled C code can. Instead the byte code must be interpreted by the JVM (Java Virtual Machine) at runtime in order to be executed. The primary drawback of a language like C is that when it is compiled, that binary file will only work on one particular architecture (e.g. x86).
Interpreted languages like PHP are effectively system independent and rely on a system and architecture specific interpreter. This leads to much greater portability (the same PHP scripts work on Windows machines and Linux machines, etc.). However, this interpretation leads to a significant performance decrease. High-level languages like PHP require more time to interpret than machine-specific instructions that can be executed by the hardware.
Java seeks to find a compromise between a purely compiled language (with no portability) and a purely interpreted language (that is significantly slower). It accomplishes this by compiling the code into a form that is closer to machine language (actually, Java byte code is a machine language, simply for the Java Virtual Machine), but can still be easily transported between architectures. Because Java still requires a software layer for execution (the JVM) it is an interpreted language. However, the interpreter (the JVM) operates on an intermediate form known as byte code rather than on the raw source files. This byte code is generated at compile time by the Java compiler. Therefore, Java is also a compiled language. By operating this way, Java gets some of the benefits of compiled languages, while also getting some of the benefits of interpreted languages. However, it also inherits some limitations from both of these languages.
As Bozho points out, there are some strategies for increasing the performance of Java code (and other byte code languages like .Net) through the use of Just in Time (JIT) compilation. The actual process varies from implementation to implementation based on the requirements, but the end-result is that the original code is compiled into byte code at compile time, but then it is run through a compiler at runtime before it is executed. By doing this, the code can be executed at near-native speeds. Some platforms (I believe .Net does this) save the result of the JIT compilation, replacing the byte code. By doing this, all future executions of the program will execute as though the program was natively compiled from the beginning.

Why do we say Java is compiled and interpreted language.
Because source code (.java files) is compiled into bytecode (.class files) that is then interpreted by a Java Virtual Machine (also known as a JVM) for execution (the JVM can do further optimization but this is anoher story).
What is the advantage over this(being compiled/interpreted)
Portability. The same bytecode can be executed on any platform as long as a JVM is installed ("compile once, run anywhere").

This is a long topic and you'd better read about JIT. In short, Java is compiled to bytecode, and the bytecode is later compiled (in the JVM) to machine code.

Java is considered a "compiled" language because code is compiled into bytecode format that is then run by the Java Virtual Machine (JVM). This gives several advantages in the realm of performance and code optimization, not to mention ensuring code correctness.
It is considered an "interpreted" language because, after the bytecode is compiled, it is runnable on any machine that has a JVM installed. It is in this way that Java is much like an interpreted language in that, for the most part, it doesn't depend on the platform on which is is being run. This behavior is similar to other interpreted languages such as Perl, Python, PHP, etc.
One theoretical downside to the fact that Java programs can be run on any system in absence of the source code is that, while this method of distribution ensures cross-platform compatibility, the developers have one less reason to release their source code, driving a wedge between the ideological meanings of the phrases "cross-platform" and "open source".

Java is compiled, into byte code not binaries. The byte codes are not executable directly, they need the Java Virtual Machine to do a just in time compile and compile them again into machine code at runtime.
At a very basic level, it separate the code that programmers write from the local machine where the JVM operates on, hence better portability. While compiling to bytecode helps the performance of just in time compile, reduce file sizes, and more or less help conceal real code. (it also eliminates some compile time error)

Compiled: Your program is syntactically a correct Java program, before the program starts.
Interpreted: Run on different platforms the same (byte-)code.
Compiled: When your program has compiled correctly you can be shure to have 80% of software bugs under control. And your code will not stop because you have not correctly closed a code block, etc.
Interpreted: You know what Applets are ? It was the "killer" application when Java came out. Your browser downloads the applet from the website and run the applet code in your browser. That is not very cool. But, the same applet runs on Windows, Linux, Macs, Solaris, ... because runs/interpreted an intermedium language: the byte code.

Related

How can java use Compiler

I studied somewhere that to execute on different processor architectures Java is interpreted. If it would use compiler then there would be some (Machine Code) instructions which would be specific to processor architectures and Java would be platform dependent.
But since java use interpreter it is processor architecture independent.
My question is how can the java use JIT (Just In Time) Compiler? Doesn't the processor's architectures affect it? If it doesn't affect it, then why doesn't it affect it?
There isn't just one JIT compiler. There is a different one for each architecture, so there's one for Windows 32-bit, one for Windows 64-bit etc.
Your Java code is the same across all platforms. That is compiled into byte code by the Java compiler. The byte code is also the same across all platforms.
Now we run your Java program on Windows 32-bit. The JVM starts up and it interprets the byte code and turns that into machine code for that architecture. Note that that JVM is specifically for this architecture.
If we run your program on another architecture, another variation of the JVM will be used to interpret the byte code.
That's why you see all these different download links when you download the JRE:
Your java code is interpreted to byte code and is not platform dependent. But to run your machine code you need a JVM, the​ JVM is platform dependent, you cannot download an x86 JVM and run it on an ARM processor and vice versa.
The idea is that the JVM is platform dependent but your code is not.
The java program life cycle goes as follows.
Source code is compiled into Java Byte Code (aka .class files),
Java Byte Code is then interpreted by the JVM which performs the Just In Time compilation sending instructions your specific processor architecture can understand.
Its important to understand that compilation is just another way to say "translation", and does not always mean compiling to binary. Also, interpretation is similar, but is done per instruction, as needed by the program.
But more specifically in your question, JIT is the interpretation done by the JVM, which is coded specifically for each processor architecture.

How exactly does the Java interpreter or any interpreter work?

I have been figuring out the exact working of an interpreter, have googled around and have come up with some conclusion, just wanted it to be rectified by someone who can give me a better understanding of the working of interpreter.
So what i have understood is:
An interpreter is a software program that converts code from high level
language to machine format.
speaking specifically about java interpreter, it gets code in binary format
(which is earlier translated by java compiler from source code to bytecode).
now platform for a java interpreter is the JVM, in which it runs, so
basically it is going to produce code which can be run by JVM.
so it takes the bytecode produces intermediate code and the target machine
code and gives it to JVM.
JVM in turns executes that code on the OS platform in which JVM is
implemented or being run.
Now i am still not clear with the sub process that happens in between i.e.
interpreter produces intermediate code.
interpreted code is then optimized.
then target code is generated
and finally executed.
Some more questions:
so is the interpreter alone responsible for generating target code ? and
executing it ?
and does executing means it gets executed in JVM or in the underlying OS ?
An interpreter is a software program that converts code from high level language to machine format.
No. That's a compiler. An interpreter is a computer program that executes the instructions written in a language directly. This is different from a compiler that converts a higher level language into a lower language. The C compiler goes from C to assembly code with the assembler (another type of compiler) translates from assembly to machine code -- modern C compilers do both steps to go from C to machine code.
In Java, the java compiler does code verification and converts from Java source to byte-code class files. It also does a number of small processing tasks such as pre-calculation of constants (if possible), caching of strings, etc..
now platform for a java interpreter is the JVM, in which it runs, so basically it is going to produce code which can be run by JVM.
The JVM operates on the bytecode directly. The java interpreter is integrated so closely with the JVM that they shouldn't really be thought of as separate entities. What also is happening is a crap-ton of optimization where bytecode is basically optimized on the fly. This makes calling it just an interpreter inadequate. See below.
so it takes the bytecode produces intermediate code and the target machine code and gives it to JVM.
The JVM is doing these translations.
JVM in turns executes that code on the OS platform in which JVM is implemented or being run.
I'd rather say that the JVM uses the bytecode, optimized user code, the java libraries which include java and native code, in conjunction with OS calls to execute java applications.
now i am still not clear with the sub process that happens in between i.e. 1. interpreter produces intermediate code. 2. interpreted code is then optimized. 3. then target code is generated 4. and finally executed.
The Java compiler generates bytecode. When the JVM executes the code, steps 2-4 happen at runtime inside of the JVM. It is very different than C (for example) which has these separate steps being run by different utilities. Don't think about this as "subprocesses", think about it as modules inside of the JVM.
so is the interpreter alone responsible for generating target code ? and executing it ?
Sort of. The JVM's interpreter by definition reads the bytecode and executes it directly. However, in modern JVMs, the interpreter works in tandem with the Just-In-Time compiler (JIT) to generate native code on the fly so that the JVM can have your code execute more efficiently.
In addition, there are post-processing "compilation" stages which analyze the generated code at runtime so that native code can be optimized by inlining often-used code blocks and through other mechanisms. This is the reason why the JVM load spikes so high on startup. Not only is it loading in the jars and class files, but it is in effect doing a cc -O3 on the fly.
and does executing means it gets executed in JVM or in the underlying OS ?
Although we talk about the JVM executing the code, this is not technically correct. As soon as the byte-code is translated into native code, the execution of the JVM and your java application is done by the CPU and the rest of the hardware architecture.
The Operating System is the substrate that that does all of the process and resource management so the programs can efficiently share the hardware and execute efficiently. The OS also provides the APIs for applications to easily access the disk, network, memory, and other hardware and resources.
1) An interpreter is a software program that converts code from high level language to machine format.
Incorrect. An interpreter is a program that runs a program expressed in some language that is NOT the computer's native machine code.
There may be a step in this process in which the source language is parsed and translated to an intermediate language, but this is not a fundamental requirement for an interpreter. In the Java case, the bytecode language has been designed so that neither parsing or a distinct intermediate language are necessary.
2) speaking specifically about java interpreter, it gets code in binary format (which is earlier translated by java compiler from source code to bytecode).
Correct. The "binary format" is Java bytecodes.
3) now platform for a java interpreter is the JVM, in which it runs, so basically it is going to produce code which can be run by JVM.
Incorrect. The bytecode interpreter is part of the JVM. The interpreter doesn't run on the JVM. And the bytecode interpreter doesn't produce anything. It just runs the bytecodes.
4) so it takes the bytecode produces intermediate code and the target machine code and gives it to JVM.
Incorrect.
5) JVM in turns executes that code on the OS platform in which JVM is implemented or being run.
Incorrect.
The real story is this:
The JVM has a number of components to it.
One component is the bytecode interpreter. It executes bytecodes pretty much directly1. You can think of the interpreter as an emulator for an abstract computer whose instruction set is bytecodes.
A second component is the JIT compiler. This translates bytecodes into the target machine's native machine code so that it can be executed by the target hardware.
1 - A typical bytecode interpreter does some work to map abstract stack frames and object layouts to concrete ones involving target-specific sizes and offsets. But to call this an "intermediate code" is a stretch. The interpreter is really just enhancing the bytecodes.
Giving a 1000 foot view which will hopefully clear things up:
There are 2 main steps to a java application: compilation, and runtime. Each process has very different functions and purposes. The main processes for both are outlined below:
Compilation
This is (normally) executed by [com.sun.tools.javac][1] usually found in the tools.jar file, traditionally in your $JAVA_HOME - the same place as java.jar, etc.
The goal here is to translate .java source files into .class files which contain the "recipe" for the java runtime environment.
Compilation steps:
Parsing: the files are read, and stripped of their 'boundary' syntax characters, such as curly braces, semicolons, and parentheses. These exists to tell the parser which java object to translate each source component into (more about this in the next point).
AST creation: The Abstract Syntax Tree is how a source file is represented. This is a literal "tree" data structure, and the root class for this is [com.sun.tools.JCTree][3]. The overall idea is that there is a java object for each Expression and each Statement. At this point in time relatively little is known about actual "types" that each represent. The only thing that is checked for at the creation of the AST is literal syntax
Desugar: This is where for loops and other syntactical sugar are translated into simpler form. The language is still in 'tree' form and not bytecode so this can easily happen
Type checking/Inference: Where the compiler gets complex. Java is a static language, so the compiler has to go over the AST using the Visitor Pattern and figure out the types of everything ahead of tim and makes sure that at runtime everything (well, almost) will be legal as far as types, method signatures, etc. goes. If something is too vague or invalid, compilation fails.
Bytecode: Control flow is checked to make sure that the program execution logic is valid (no unreachable statements, etc.) If everything passes the checks without errors, then the AST is translated into the bytecodes that the program represents.
.class file writing: at this point, the class files are written. Essentially, the bytecode is a small layer of abstraction on top of specialized machine code. This makes it possible to port to other machines/CPU structures/platforms without having to worry about the relatively small differences between them.
Runtime
There is a different Runtime Environment/Virtual Machine implementation for each computer platform. The Java APIs are universal, but the runtime environment is an entirely separate piece of software.
JRE only knows how to translate bytecode from the class files into machine code compatible with the target platform, and that is also highly optimized for the respective platform.
There are many different runtime/vm implementations, but the most popular one is the Hotspot VM.
The VM is incredibly complex and optimizes your code at runtime. Startup times are slow but it essentially "learns" as it goes.
This is the 'JIT' (Just-in-time) concept in action - the compiler did all of the heavy lifting by checking for correct types and syntax, and the VM simply translates and optimizes the bytecode to machine code as it goes.
Also...
The Java compiler API was standardized under JSR 199. While not exactly falling under same thing (can't find the exact JLS), many other languages and tools leverage the standardized compilation process/API in order to use the advanced JVM (runtime) technology that Oracle provides, while allowing for different syntax.
See Scala, Groovy, Kotlin, Jython, JRuby, etc. All of these leverage the Java Runtime Environment because they translate their different syntax to be compatible with the Java compiler API! It's pretty neat - anyone can write a high-performance language with whatever syntax they want because of the decoupling of the two. There's adaptations for almost every single language for the JVM
I'll answer based on my experience on creating a DSL.
C is compiled because you run pass the source code to the gcc and runs the stored program in machine code.
Python is interpreted because you run programs by passing the program source to the interpreter. The interpreter reads the source file and executes it.
Java is a mix of both because you "compile" the Java file to bytecode, then invokes the JVM to run it. Bytecode isn't machine code, it needs to be interpreted by the JVM. Java is in a tier between C and Python because you cannot do fancy things like a "eval" (evaluating chunks of code or expressions at runtime as in Python). However, Java has reflection abilities that are impossible to a C program to have. In short, the design of Java runtime being in a intermediary tier between a pure compiled and a interpreted language gives the best (and the worst) of the two words in terms of performance and flexibility.
However, Python also has a virtual machine and it's own bytecode format. The same applies to Perl, Lua, etc. Those interpreters first converts a source file to a bytecode, then they interpret the bytecode.
I always wondered why doing this is necessary, until I made my own interpreter for a simulation DSL. My interpreter does a lexical analysis (break a source in tokens), converts it to a abstract syntax tree, then it evaluates the tree by traversing it. For software engineering sake I'm using some design patterns and my code heavily uses polymorphism. This is very slow in comparison to processing a efficient bytecode format that mimics a real computer architecture. My simulations would be way faster if I create my own virtual machine or use a existent one. For evaluating a long numeric expression, for instance, it'll be faster to translate it to something similar to assembly code than processing a branch of a abstract tree, since it requires calling a lot of polymorphic methods.
There are two ways of executing a program.
By way of a compiler: this parses a text in the programming language (say .c) to machine code, on Windows .exe. This can then be executed independent of the compiler.
This compilation can be done by compiling several .c files to several object files (intermediate products), and then linking them into a single application or library.
By way of an interpreter: this parses a text in the programming language (say .java) and "immediately" executes the program.
With java the approach is a bit hybrid/stacked: the java compiler javac compiles .java to .class files, and possibles zips those in .jar (or .war, .ear ...). The .class files consist of a more abstract byte code, for an abstract stack machine.
Then the java runtime java (call JVM, java virtual machine, or byte code interpreter) can execute a .class/.jar. This is in fact an interpreter of java byte code.
Nowadays it also translates (parts) of the byte code at run time to machine code. This is also called a just-in-time compiler for byte code to machine code.
In short:
- a compiler just creates code;
- an interpreter immediately executes.
An interpreter will loop over parsed commands / a high level intermediate code, and interprete every command with a piece of code. Indirect an in principle slow.

Confused about advantage of interpreted language

I'm confused about the advantage of an interpreted language like java, over a compiled language.
The standard explanation for the advantage of an interpreted language, such as java, over a compiled language is that the same .class file can run on different types of machine architectures. How doe this save you any work?
For each different machine architecture, wouldn't you need a different compiler to interpret the same .class file into the machine language? So if you need a different compiler for each different machine architecture to interpret the same .class file into machine code, how does this save you any work?
Why not just makes a compiled language where the .java source file is compiled into machine language right away. Sure this would require a different compiler to compile from the java source file to machine language for each machine architecture, but how is this any more work than having to have a different compiler for each machine compile from a .class file to machine language?
I mean this is the same as with a compiled language -- you need a compiler for each machine architecture whether it's compiling a java source file into machine code or a class file into machine code.
thanks.
First, the claim that "Java is interpreted", while it has some basis in truth, is pretty misleading, and it's probably best if you simply delete that notion from your head.
Java is compiled from source code to an intermediate representation (classfiles, bytecode) at build time. When a class is first loaded, most JVM implementations (including Oracle HotSpot, and IBM J9) will go through a short-lived interpretation phase, but if the class is going to be used with any frequency, a dynamic compiler (JIT) will run and compile to native code. (Some JVMs, like JRockit, go directly to native with no interpreter.)
I think "Why isn't Java compiled directly to native code" is the real question here. The obvious answer, as others have suggested, is portability. But its more subtle than that: dynamic compilation yields higher quality code than static compilation. When the code is compiled dynamically, the JIT knows things that no static compiler could know: characteristics of the current hardware (CPU version, stepping level, cache line size, etc), as well as properties of the current run (thanks to profiling data gathered during interpretation.) The quality of your decisions is dependent on the quality of your information; a dynamic compiler has more and better information available to it than a static compiler ever could, and so can produce better code.
Further, dynamic compilers have the possibility to perform optimizations that no static compiler could dream of. For example, dynamic compilers can do speculative optimizations, and back them out if they turn out to be ineffective or if their assumptions later become incorrect. (See "Dynamic Deoptimization" here: http://www.ibm.com/developerworks/library/j-jtp12214/).
For each different machine architecture, wouldn't you need a different
compiler to interpret the same .class file into the machine language?
So if you need a different compiler for each different machine
architecture to interpret the same .class file into machine code, how
does this save you any work?
The above statement is your core misunderstanding.
Application developers write Java code that is compiled to byte code that can run on any compliant Java Virtual Machine.
The Java Virtual Machine interprets (and possibly compiles) this bytecode and executes the application. These JVMs are developed for the major architectures and operating systems, such as Windows/Mac/Linux on Intel. The JVM developers are a relatively small group of engineers, such as those from Oracle and Sun.
From the application developers' point of view, he or she only has to compile once because the byte code (in the .class file) can be executed on compliant JVMs. Application developers do not need to worry about the underlying architecture or OS; they only need to target the architecture of the JVM. The JVM developers are the ones who deal with the underlying layer.
I like answer of sowrd299. My two cents:
you have technically endless different target architectures
and therefore compiling your code for all targets at the same time and packing it together would result an infinite big executable
therefore compiling Java against a virtual machine byte code is a better solution: since it has the footprint of only one target architecture
while developers can separately add JVMs for all new and old architectures, allowing total new stuffs (such as a Raspberry PI) run your java code compiled in the previous century.
On the other hand, the "compile for multiple targets in advance" is not a totally insane thing. Afaik Windows Universal Apps works this way: it is the same application in the same exe file, but actually the exe contains the code compiled for a 80x86 as well as for an ARM target. This way one application looks to be portable amongst windows mobile and desktop solutions without any further interpreting.
First, Java is a compiled language as well an interpreted language, because you have to compile from .java to .class.
To get the meat of your question, the advantage Java gains by being (somewhat) interpreted is you only need to compile each program once, and it can run on any computer, because the Java Runtime Environment (JRE), which is compiled in advance to match the local OS and architecture, can bridge that gap without (or with minimal) further compiling.
In an uninterpreted language, however, you must compile each program for each OS and each Architecture you want it to run on, which entails much more effort and total compile time than just compiling the JRE once for each OS and architecture and only compiling each individual program once.
It would be impractical to have a language that compiles for the local architecture each and every time it runs, because compiling is a rather intensive process. Python does compile each time it runs (though, like Java, it compiles for a Runtime Environment, not the local architecture) and it is one of the slowest languages out there.
Hopefully that helped clear things up.

Why Java is both compiled and interpreted language when the JIT also compiles the bytecode?

I read that, a java source code is compiled into 'bytecode' then it is 'Compiled' again by JIT into 'machine code'. That is, the source code is first compiled into a platform independent bytecode and then compiled again to a machine specific code. Then why it is called as both interpreted and compiled language? Where the interpretation takes place?
There is a bit of misunderstanding here.
In normal circumstances java compiler(javac) compiles java code to bytecodes and java interpreter(java) interpretes these bytecodes(line by line), convert it into machine language and execute.
JIT(Just in time) compiler is a bit different concept. JVM maintains a count of times a function is executed. If it exceeds the limit then JIT comes into picture. java code is directly compiled into machine language and there on this is used to execute that function.
Java is a programming language.
It has a specification (the JLS) that defines how Java programs should act.
As a language itself, it does not specify how it should be executed on different platforms. The way it runs, with a JIT or without a JIT is entirely implementation based.
If I write a Java runtime tomorrow that does not do JIT compilation at all I can call Java interpreted.
If I take a Java machine (and people seriously made those) that uses Java bytecode as assembly, I can call Java strictly compiled.
A lot of other languages do this:
Is python an interpreted language? (CPython) or is it JITed (PyPy)?
Is Lua interpreted (old lua interpreters) or is it compiled (LuaJIT)?
Is JavaScript interpreted (IE6 style) or is it compiled (v8)?
For the sake of precision, let's make clear this is not a Java programming language question, but a JVM feature.
In JVM first implementations, JIT didn't exist and bytecode was always interpreted. This was due to a design decision to make compiled code independent of the physical machine and OS running java, and is still valid today.
As a later refinement, JIT was introduced in the JVM implementation for a faster execution, but the bytecode must still be valid and pass all the validations before being translated to binary. This way you keep the platform independence, all the sanity and security checks and you gain performance.
Java is Hybrid Language i.e. it is both Compiled(work done upfront) and Interpreted(work done receiving-end).
Byte code is an IL(Intermediate Language) to Java. Java source code compiles to Bytecode by javac. Sometimes this byte code again compiles into Machine language which is referred as JIT(Just-In-Time) compilation.
JIT compilation is a way of executing computer code that involves compilation during execution of a program – at run time – rather than prior to execution. source
JVM(without JIT) interprets the java Intermediate Language byte code to native machine language as follows:
Source
JVM is an abstract computing machine, it has several implementations:
HotSpot (Interpreter + JIT compiler) : the primary reference Java VM implementation. Used by both Oracle Java and OpenJDK.
JamVM (Interpreter) Developed to be an extremely small virtual machine compared to others. Designed to use GNU Classpath. Supports several architectures. GPL.
ART (Interpreter + AOT compiler i.e. Ahead-of-time compilation) Android RunTime is an application runtime environment used by the Android operating system replacing Dalvik (interpreter + JIT compiler).
List of Java virtual machines
javac is a compiler and it converts java code into bytecode (see bytecode) which is easy to run on any machine if we have a JVM (java Virtual Machine). and interpreter converts java bytecode into machine code.
It serves two purposes. The first is to ensure that the code is syntactically and semantically correct. Secondly, the compilation process produces byte-code. As you note, this is an architecture-agnostic intermediate language that can be interpreted or just-in-time compiled to native code by the JVM for a specific machine architecture. By compiling to byte-code, much of the overhead associated with compilation can be done in advance, leaving the JVM to generate native code from or interpret byte-code that has been thoroughly and rigorously checked beforehand.
Unlike other programming language java is compiled and interpreted language. Java IDE acts as a compiler and JVM(java virtual machine) behave like an interpreter. i.e. when any program let say Hello, is saved after compiling as Hello.java and after compiling this file we get Hello.Class extension file is called as class-file, byte-code or intermediate code. Byte-code is not dependent for any specific machine so it is also called as intermediate code.
To convert this byte-code into machine code or machine understandable format JVM is used which is different for different operating system. JIT(Just in Time Compiler) is a part of JVM that is enabled by default compiles the bytecode into the native machine code compiling in 'just in time'.

What interprets Java's byte code

I was wondering if Java get's assembled and in my readings I found the compiler creates byte code which is then run on the Java Virtual Machine. Does the JVM interpret the byte code and execute it?
This is why I'm confused. Today in class the prof said "the compiler takes a high level language, creates assembly language, then the assembler takes the assembly language and creates machine language (binary) which can be run". So if Java compiles to bytecode how can it be run?
There is a standard compiler setup, such as would be used for the C language, and then there is Java, which is significantly different.
The standard C compiler compiles (through several internal phases) into "machine instructions" which are directly understood by the x86 processor or whatever.
The Java compiler, on the other hand, compiles to what are sometimes called "bytecodes". These are machine instructions, but for an imaginary machine, the Java Virtual Machine. So the JVM interprets the bytecodes just like a "real" machine processes it's machine instructions. (The main advantage of this is that a program compiled into bytecodes will run on any JVM, whether it be on an x86 system, an IBM RISC box, or the ARM processor in a Android -- so long as there's a JVM the code will run.)
(There have historically been a number of "virtual machines" similar to Java, the UCSD Pascal "P-code" system being one of the more successful ones.)
But it gets more complicated --
Interpreting "bytecodes" is fairly slow and inefficient, so most Java implementations have some sort of scheme to translate the bytecodes into "real" machine instructions. In some cases this is done statically, in a separate compile step, but most often it's done with a "just-in-time compiler" (JITC) which converts small portions of the bytecodes to machine instructions while the application is running. These get to be quite elaborate, with complex schemes to decide which segments of code will benefit most from translating into hardware machine instructions. But they all, for the most part, do their magic without you needing to be aware of what's going on, and without you having to compile your Java code to target a specific type of processor.
Think of bytecode as the machine langauge of the JVM. (Compilers don't HAVE to produce assembly code which has to be assembled, but they're a lot easier to write that way.)
Just a clarifying note:
That which in java is called "bytecode" is what in your original description is "creates machine language (binary) which can be run"
So the answer to how to run java bytecode is:
You build a processor which can handle java bytecode, in the same way that if you want to execute normal x86 code you build a cpu to handle that.
Javas binary machine language is not really different from the binary instruction format of other cpus such as x86 or powerpc. And there do exists cpus which can execute java bytecode directly. (That would be a normal Intel/Amd cpu).
An other example: How would you run powerpc code, on a normal intel cpu? You would build a piece of software which would at runtime translate the powerpc binary code, to x86 code. The case for java is not really that different. So to run java code on a x86 cpu, you need a program which translate the java binary code(aka the bytecode) to x86 binary code. This is what the jvm* does. And it does this either by interpreting the java instructions one at a time, or by translating a huge chunk of instructions at a time(Called jit). Exactly how the jvm handles the translation depend on which jvm implementation you use and its settings.(There are multiple independent implementations of java jvms which implement their translation in different ways).
But there is one thing which make java a bit different. Unlike other binary instruction formats such as x86, java was newer really designed to run directly on a cpu. Its binary format is designed in a way which make it easy to translate it to binary code for "normal" cpus such as x86 or powerpc.
*The jvm does in fact handle more then just translating the java binary code to processor dependend code. It also handles memory allocations for java programs, and it handles communication between a java program, and the users operation system. This is done to make the java program relative independent of the users operation system and platform details.
In a short explanation: The JVM translates the Java Byte Code into machine specific code. The generated machine specific code is then executed by the machine.
The Java compiler translates JAVA into ByteCode. The JVM translates ByteCode into Assembly (machine specific code) at runtime. The machine executes the Assembly.

Categories

Resources