Role of Compiler and JVM in java? [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am new to java and bit confused about the role of compiler and JVM . I have read couple of resources and to name few
What-are-the-functions-of-Java-Compiler ?
Is the JVM a compiler or an interpreter?
Compiler
As i save the file .java file in system, computer internally saves it in bytes 0 and 1's. I understand compiler is validating whether written java program
confirms to java standard or not. If not throws the error otherwise jenerate the class file.
My question is what is need of generating .class file.
Can't
JVM interpret it directly (from bytes generated corresponding to .java file) without needing .class file? Does compiler(javac) do any kind of optimization here ?
JVM :- This question is other way around. Can't compiler generate the byte/machine code which can be interpreted by CPU directly? so why JVM is needed here ?
Is JVM required to interpret the byte code specific to platform for example windows or linux ?

The compiler generates byte code, in a Java meaning, which is a .class file, containing the byte code, which is not made of 1 or 0, no matter what OS you are running on.
And JVM interprets byte code to run it on specific OS

The main point of having these two stages and intermediate representation ("byte code" in Java) is platform-independence.
The Java program, in a source code, is platform-independent (to a degree). When you compile it into byte code, the Java compiler basically does (almost) all of the things it can do while still maintaining the platform-independence:
validates syntax
performs static type checks
translates human-readable source code into machine-readable byte code
does static optimizations
etc.
These are all things, that:
maintain platform-independence
only need to be performed once, since they don't rely on any run-time data
take (possibly long) time to do so it would be waste of time to do them again each time the code is executed
So now you have .class files with byte code, which are still platform-independent and can be distributed to different OS or even hardware platform.
And then the second step is the JVM. The JVM itself is platform-specific, and it's task is to translate/compile/interpret the byte code on the target architecture and OS. That means:
translate byte code to the instruction set of given platform, and using target OS system calls
run-time optimizations
etc.

What-are-the-functions-of-Java-Compiler ?
'Javac' is a Java compiler which produces byte-code(.class File) and that code is platform-independent.
Is the JVM a compiler or an interpreter? Ans- Interpreter
JVM is Java Virtual Machine -- Runs/ Interprets/ translates Bytecode into Native Machine Code and it internally uses JIT
JIT Compiles the given bytecode instruction sequence to machine code at runtime before executing it natively and do all the heavy optimizations.
All the above complexities exists to make java compile once run anywhere or Platform independent Language and Because of that bytecode or JAVAC Output is platform-independent but JVM executing that bytecode is Platform-dependent i.e we have different JVM for Windows and Unix.

A JVM, is an implementation of the Java Virtual Machine Specification. It interprets compiled Java binary code (called bytecode) for a computer's processor (or "hardware platform").
JVM is the target machine for byte-code instead of the underlying architecture.
The Java compiler ,'Javac' produces byte-code which is platform-independent. This byte-code is, we can say generic, ie., it does not include machine level details, which are specific to each platform.
The instructions in this byte-code cannot be directly run by the CPU.
Therefore some other 'program' is needed which can interpret the code, and give the CPU machine level instructions which it can execute. This program is the 'JVM' (Java Virtual Machine) which is platform specific.
That's why you have different JVM for Windows, Linux or Solaris.

I think first we should discuss the difference between executing an interpreted runtime vs a machinecode runtime vs a bytecode runtime.
In an interpreted runtime the (generally human readable) source code is converted to machine code by the interpreter only at the point when the code is run. Generally this confers advantages such that code is platform independent (so long as an interpreter exists for your platform) and ease of debugging (the code is right there in front of you) but at the cost of relatively slow execution speed as you have the overhead of converting the source code into machine code when you run the program.
In a compiled runtime the source code has been compiled into native machine code ahead of time by a dedicated compiler. This gives fast execution speed (because the code is already in the format expected by the processor), but means that the thing you distribute (the compiled binary) is generally tied to a given platform.
A bytecode runtime is sort of a halfway house that aims to give the advantages of both intepretation and compilation. In this case the source code is complied down into an intermediate format (byte code) ahead of time and then converted into machine code at runtime. The byte code is designed to be machine friendly rather than human friendly which means it's much faster to convert to machine code than in a traditionally interpreted language. Additionally, because the actual conversion to machine code is being done at run time, you still get all that nice platform independence.
Note that the choice of whether to intepret or compile is independent of the language used: for example there is no reason in theory why you could not have a c intepreter or compile python directly into machine code. Of course in practise most languages are generally only either compiled or interpreted.
So this brings us back to the question of what the Java compiler does- essentially it's main job is to turn all your nice human readable java files into java bytecode (class files) so that the JVM can efficiently execute them.
The JVM's main job, on the other hand, is to take those class files and turn them into machine code at execution time. Of course it also does other stuff (for example it manages your memory and it provides various standard libraries), but from the point of view of your question it's the conversion to machine code that's important!

Java bytecode is an intermediate, compact, way of representing a series of operations. The processor can't execute these directly. A processor executes machine instructions. They are the only thing that a processor can understands.
The JVM processes stream of bytecode operations and interprets them into a series of machine instructions for the processor to execute.
My question is what is need of generating .class file. Can't JVM interpret it directly (from bytes generated corresponding to .java file) without needing .class file? Does compiler (javac) do any kind of optimization here ?
The javac generate .class file which is an intermediate thing to achieve platform independence.
To see what compiler optimized, simply decompile the bytecode, for instance with javap or JD.
This question is other way around. Can't compiler generate the byte/machine code which can be interpreted by CPU directly? so why JVM is needed here ? Is JVM required to interpret the byte code specific to platform for example windows or linux ?
The designers of the Java language decided that the language and the compiled code was going to be platform independent, but since the code eventually has to run on a physical platform, they opted to put all the platform dependent code in the JVM. Hence for this reason we need JVM to execute code.
To see what the just in time compiler optimized, activate debug logging and read the disassembled machine code.

Related

How can java use Compiler

I studied somewhere that to execute on different processor architectures Java is interpreted. If it would use compiler then there would be some (Machine Code) instructions which would be specific to processor architectures and Java would be platform dependent.
But since java use interpreter it is processor architecture independent.
My question is how can the java use JIT (Just In Time) Compiler? Doesn't the processor's architectures affect it? If it doesn't affect it, then why doesn't it affect it?
There isn't just one JIT compiler. There is a different one for each architecture, so there's one for Windows 32-bit, one for Windows 64-bit etc.
Your Java code is the same across all platforms. That is compiled into byte code by the Java compiler. The byte code is also the same across all platforms.
Now we run your Java program on Windows 32-bit. The JVM starts up and it interprets the byte code and turns that into machine code for that architecture. Note that that JVM is specifically for this architecture.
If we run your program on another architecture, another variation of the JVM will be used to interpret the byte code.
That's why you see all these different download links when you download the JRE:
Your java code is interpreted to byte code and is not platform dependent. But to run your machine code you need a JVM, the​ JVM is platform dependent, you cannot download an x86 JVM and run it on an ARM processor and vice versa.
The idea is that the JVM is platform dependent but your code is not.
The java program life cycle goes as follows.
Source code is compiled into Java Byte Code (aka .class files),
Java Byte Code is then interpreted by the JVM which performs the Just In Time compilation sending instructions your specific processor architecture can understand.
Its important to understand that compilation is just another way to say "translation", and does not always mean compiling to binary. Also, interpretation is similar, but is done per instruction, as needed by the program.
But more specifically in your question, JIT is the interpretation done by the JVM, which is coded specifically for each processor architecture.

How exactly does the Java interpreter or any interpreter work?

I have been figuring out the exact working of an interpreter, have googled around and have come up with some conclusion, just wanted it to be rectified by someone who can give me a better understanding of the working of interpreter.
So what i have understood is:
An interpreter is a software program that converts code from high level
language to machine format.
speaking specifically about java interpreter, it gets code in binary format
(which is earlier translated by java compiler from source code to bytecode).
now platform for a java interpreter is the JVM, in which it runs, so
basically it is going to produce code which can be run by JVM.
so it takes the bytecode produces intermediate code and the target machine
code and gives it to JVM.
JVM in turns executes that code on the OS platform in which JVM is
implemented or being run.
Now i am still not clear with the sub process that happens in between i.e.
interpreter produces intermediate code.
interpreted code is then optimized.
then target code is generated
and finally executed.
Some more questions:
so is the interpreter alone responsible for generating target code ? and
executing it ?
and does executing means it gets executed in JVM or in the underlying OS ?
An interpreter is a software program that converts code from high level language to machine format.
No. That's a compiler. An interpreter is a computer program that executes the instructions written in a language directly. This is different from a compiler that converts a higher level language into a lower language. The C compiler goes from C to assembly code with the assembler (another type of compiler) translates from assembly to machine code -- modern C compilers do both steps to go from C to machine code.
In Java, the java compiler does code verification and converts from Java source to byte-code class files. It also does a number of small processing tasks such as pre-calculation of constants (if possible), caching of strings, etc..
now platform for a java interpreter is the JVM, in which it runs, so basically it is going to produce code which can be run by JVM.
The JVM operates on the bytecode directly. The java interpreter is integrated so closely with the JVM that they shouldn't really be thought of as separate entities. What also is happening is a crap-ton of optimization where bytecode is basically optimized on the fly. This makes calling it just an interpreter inadequate. See below.
so it takes the bytecode produces intermediate code and the target machine code and gives it to JVM.
The JVM is doing these translations.
JVM in turns executes that code on the OS platform in which JVM is implemented or being run.
I'd rather say that the JVM uses the bytecode, optimized user code, the java libraries which include java and native code, in conjunction with OS calls to execute java applications.
now i am still not clear with the sub process that happens in between i.e. 1. interpreter produces intermediate code. 2. interpreted code is then optimized. 3. then target code is generated 4. and finally executed.
The Java compiler generates bytecode. When the JVM executes the code, steps 2-4 happen at runtime inside of the JVM. It is very different than C (for example) which has these separate steps being run by different utilities. Don't think about this as "subprocesses", think about it as modules inside of the JVM.
so is the interpreter alone responsible for generating target code ? and executing it ?
Sort of. The JVM's interpreter by definition reads the bytecode and executes it directly. However, in modern JVMs, the interpreter works in tandem with the Just-In-Time compiler (JIT) to generate native code on the fly so that the JVM can have your code execute more efficiently.
In addition, there are post-processing "compilation" stages which analyze the generated code at runtime so that native code can be optimized by inlining often-used code blocks and through other mechanisms. This is the reason why the JVM load spikes so high on startup. Not only is it loading in the jars and class files, but it is in effect doing a cc -O3 on the fly.
and does executing means it gets executed in JVM or in the underlying OS ?
Although we talk about the JVM executing the code, this is not technically correct. As soon as the byte-code is translated into native code, the execution of the JVM and your java application is done by the CPU and the rest of the hardware architecture.
The Operating System is the substrate that that does all of the process and resource management so the programs can efficiently share the hardware and execute efficiently. The OS also provides the APIs for applications to easily access the disk, network, memory, and other hardware and resources.
1) An interpreter is a software program that converts code from high level language to machine format.
Incorrect. An interpreter is a program that runs a program expressed in some language that is NOT the computer's native machine code.
There may be a step in this process in which the source language is parsed and translated to an intermediate language, but this is not a fundamental requirement for an interpreter. In the Java case, the bytecode language has been designed so that neither parsing or a distinct intermediate language are necessary.
2) speaking specifically about java interpreter, it gets code in binary format (which is earlier translated by java compiler from source code to bytecode).
Correct. The "binary format" is Java bytecodes.
3) now platform for a java interpreter is the JVM, in which it runs, so basically it is going to produce code which can be run by JVM.
Incorrect. The bytecode interpreter is part of the JVM. The interpreter doesn't run on the JVM. And the bytecode interpreter doesn't produce anything. It just runs the bytecodes.
4) so it takes the bytecode produces intermediate code and the target machine code and gives it to JVM.
Incorrect.
5) JVM in turns executes that code on the OS platform in which JVM is implemented or being run.
Incorrect.
The real story is this:
The JVM has a number of components to it.
One component is the bytecode interpreter. It executes bytecodes pretty much directly1. You can think of the interpreter as an emulator for an abstract computer whose instruction set is bytecodes.
A second component is the JIT compiler. This translates bytecodes into the target machine's native machine code so that it can be executed by the target hardware.
1 - A typical bytecode interpreter does some work to map abstract stack frames and object layouts to concrete ones involving target-specific sizes and offsets. But to call this an "intermediate code" is a stretch. The interpreter is really just enhancing the bytecodes.
Giving a 1000 foot view which will hopefully clear things up:
There are 2 main steps to a java application: compilation, and runtime. Each process has very different functions and purposes. The main processes for both are outlined below:
Compilation
This is (normally) executed by [com.sun.tools.javac][1] usually found in the tools.jar file, traditionally in your $JAVA_HOME - the same place as java.jar, etc.
The goal here is to translate .java source files into .class files which contain the "recipe" for the java runtime environment.
Compilation steps:
Parsing: the files are read, and stripped of their 'boundary' syntax characters, such as curly braces, semicolons, and parentheses. These exists to tell the parser which java object to translate each source component into (more about this in the next point).
AST creation: The Abstract Syntax Tree is how a source file is represented. This is a literal "tree" data structure, and the root class for this is [com.sun.tools.JCTree][3]. The overall idea is that there is a java object for each Expression and each Statement. At this point in time relatively little is known about actual "types" that each represent. The only thing that is checked for at the creation of the AST is literal syntax
Desugar: This is where for loops and other syntactical sugar are translated into simpler form. The language is still in 'tree' form and not bytecode so this can easily happen
Type checking/Inference: Where the compiler gets complex. Java is a static language, so the compiler has to go over the AST using the Visitor Pattern and figure out the types of everything ahead of tim and makes sure that at runtime everything (well, almost) will be legal as far as types, method signatures, etc. goes. If something is too vague or invalid, compilation fails.
Bytecode: Control flow is checked to make sure that the program execution logic is valid (no unreachable statements, etc.) If everything passes the checks without errors, then the AST is translated into the bytecodes that the program represents.
.class file writing: at this point, the class files are written. Essentially, the bytecode is a small layer of abstraction on top of specialized machine code. This makes it possible to port to other machines/CPU structures/platforms without having to worry about the relatively small differences between them.
Runtime
There is a different Runtime Environment/Virtual Machine implementation for each computer platform. The Java APIs are universal, but the runtime environment is an entirely separate piece of software.
JRE only knows how to translate bytecode from the class files into machine code compatible with the target platform, and that is also highly optimized for the respective platform.
There are many different runtime/vm implementations, but the most popular one is the Hotspot VM.
The VM is incredibly complex and optimizes your code at runtime. Startup times are slow but it essentially "learns" as it goes.
This is the 'JIT' (Just-in-time) concept in action - the compiler did all of the heavy lifting by checking for correct types and syntax, and the VM simply translates and optimizes the bytecode to machine code as it goes.
Also...
The Java compiler API was standardized under JSR 199. While not exactly falling under same thing (can't find the exact JLS), many other languages and tools leverage the standardized compilation process/API in order to use the advanced JVM (runtime) technology that Oracle provides, while allowing for different syntax.
See Scala, Groovy, Kotlin, Jython, JRuby, etc. All of these leverage the Java Runtime Environment because they translate their different syntax to be compatible with the Java compiler API! It's pretty neat - anyone can write a high-performance language with whatever syntax they want because of the decoupling of the two. There's adaptations for almost every single language for the JVM
I'll answer based on my experience on creating a DSL.
C is compiled because you run pass the source code to the gcc and runs the stored program in machine code.
Python is interpreted because you run programs by passing the program source to the interpreter. The interpreter reads the source file and executes it.
Java is a mix of both because you "compile" the Java file to bytecode, then invokes the JVM to run it. Bytecode isn't machine code, it needs to be interpreted by the JVM. Java is in a tier between C and Python because you cannot do fancy things like a "eval" (evaluating chunks of code or expressions at runtime as in Python). However, Java has reflection abilities that are impossible to a C program to have. In short, the design of Java runtime being in a intermediary tier between a pure compiled and a interpreted language gives the best (and the worst) of the two words in terms of performance and flexibility.
However, Python also has a virtual machine and it's own bytecode format. The same applies to Perl, Lua, etc. Those interpreters first converts a source file to a bytecode, then they interpret the bytecode.
I always wondered why doing this is necessary, until I made my own interpreter for a simulation DSL. My interpreter does a lexical analysis (break a source in tokens), converts it to a abstract syntax tree, then it evaluates the tree by traversing it. For software engineering sake I'm using some design patterns and my code heavily uses polymorphism. This is very slow in comparison to processing a efficient bytecode format that mimics a real computer architecture. My simulations would be way faster if I create my own virtual machine or use a existent one. For evaluating a long numeric expression, for instance, it'll be faster to translate it to something similar to assembly code than processing a branch of a abstract tree, since it requires calling a lot of polymorphic methods.
There are two ways of executing a program.
By way of a compiler: this parses a text in the programming language (say .c) to machine code, on Windows .exe. This can then be executed independent of the compiler.
This compilation can be done by compiling several .c files to several object files (intermediate products), and then linking them into a single application or library.
By way of an interpreter: this parses a text in the programming language (say .java) and "immediately" executes the program.
With java the approach is a bit hybrid/stacked: the java compiler javac compiles .java to .class files, and possibles zips those in .jar (or .war, .ear ...). The .class files consist of a more abstract byte code, for an abstract stack machine.
Then the java runtime java (call JVM, java virtual machine, or byte code interpreter) can execute a .class/.jar. This is in fact an interpreter of java byte code.
Nowadays it also translates (parts) of the byte code at run time to machine code. This is also called a just-in-time compiler for byte code to machine code.
In short:
- a compiler just creates code;
- an interpreter immediately executes.
An interpreter will loop over parsed commands / a high level intermediate code, and interprete every command with a piece of code. Indirect an in principle slow.

Platform independence in Java's Byte Code

I sometimes wonder why Java is referred as a Platform Independent Language?
I couldn't find a proper explanation of the below points :
Is the JVM same for Windows/Linux/Mac OS?
Are the bytecode generated same for a same Class in the above environments?
If the answer to the above questions are NO then how the platform independence is achieved.
Please help me out in learning this basic concept.
Is the JVM same for Windows/Linux/Mac OS?
Not at all. Compiler is same across the platforms. But, since it is an executable file, the file itself will be different i.e. on Windows, it would be .exe, on Linux, it would be Linux executable etc.
Are the bytecode generated same for a same Class in the above environments?
Yes. That is why Java is COMPILE ONCE. RUN ANYWHERE.
Before starting please read this doc by oracle
Machine Dependence: This means that whatever you want to execute on your hardware architecture will not be able to execute on another architecture. Like If you have created an executable for your AMD architecture it will not be able to run on Intel's architecture. Now comes Platform Dependence is that you have created some executable for your Windows OS which won't be able to run on Linux.Code written in Assembly(provided by your processor) or Machine Language are machine dependent but if you write code in C,CPP,JAVA then your code is machine independent which is provided by underlying OS.
Platform Independence:If you create some C or CPP code then it becomes platform dependent because it produces an intermediate file i.e. compiled file which matches to the instruction set provided by underlying OS. So you need some mediator which can understand both compiler and OS.Java achieved this by creating JVM. Note: No language is machine independent if you remove the OS which itself is a program created using some language which can directly talk to your underlying machine architecture. OS is such a program which takes your compiled code and run it ontop of the underlying architecture.
The meaning of platform independence is that you only have to distribute your Java program in one format.
This one format will be interpreted by JVMs on each platform (which are coded as different programs optimized for the platform they are on) such that it can run anywhere a JVM exists.
1 ) Is the JVM same for Windows/Linux/Mac OS?
Answer ===> NO , JVM is different for All
2 ) Are the bytecode generated same for a same Class in the above environments?
Answer ====> YES , Byte code generated will be the same.
Below explanation will give you more clarification.
{App1(Java code)------>App1byteCode}........{(JVM+MacOS) help work with App1,App2,App3}
{App2(Java Code)----->App2byteCode}........{(JVM+LinuxOS) help work with App1,App2,App3}
{App3(Java Code)----->App3byteCode}........{(JVM+WindowsOS) help work with App1,App2,App3}
How This is Happening ?
Ans--> JVM Has capability to Read ByteCode and Response In Accordance with the underlying OS As the JVM is in Sync with OS.
So we find, we need JVM with Sync with Platform.
But the main Thing is, That the programmer do not have to know specific knowledge of the Platform and program his application keeping one specific platform in mind.
This Flexibility of write Program in Java Language --- compile to bytecode and run on any Machine (Yes need to have Platform DEPENDENT JVM to execute it) makes Java Platform Independent.
Java is called a plattform indipendent language, because virtually all you need to run your code on any operating system, is that systems JVM.
The JVM "maps" your java codes commands to the system's commands, so you don't have to change your code for any operating system, but just install that system's JVM (which should be provided Oracle)
The credo is "Write once, run anywhere."
Watch this 2 min video tutorial hope this will help you understand that why java is platform independent? Everything is explained in just 2 min and 37 seconds.
Why Java is platform independent?
https://www.youtube.com/watch?v=Vn8hdwxkyKI
And here is explanation given below;
There are two steps required to run any java program i.e.
(i) Compilation &
(ii) Interpretation Steps.
Java compiler, which is commonly known as "javac" is used to compile any java file. During compilation process, java compiler will compile each & every statement of java file. If the java program contains any error then it will generate error message on the Output screen. On successful completion of compilation process compiler will create a new file which is known as Class File / Binary Coded File / Byte Code File / Magic Code File.
Generated class file is a binary file therefore java interpreter commonly known as Java is required to interpret each & every statement of class file. After the successful completion of interpretation process, machine will generate Output on the Output screen.
This generated class file is a binary coded file which is depends on the components provided by java interpreter (java) & does not depends on the tools & components available in operating system.
Therefore, we can run java program in any type of operating system provided java interpreter should be available in operating system. Hence, Java language is known as platform independent language.
Two things happen when you run an application in Java,
Java compiler (javac) will compile the source into a bytecode (stored in a .class file)
The java Byte Code (.class) is OS independent, it has same extension in all the different OSs. But since this is not specific to any OS or other environment no one can run this (Unless there is a machine whose native instruction set is bytecodes, i.e. they can understand bytecode itself)
JVM load and execute the bytecode
A virtual machine (VM) is a software implementation of a machine (i.e. a computer) that executes programs like a physical machine. Java also has a virtual machine called Java Virtual Machine (JVM).
JVM has a class loader that loads the compiled Java Bytecode to the Runtime Data Areas. And it has an execution engine which executes the Java Bytecode. And importantly he JVM is platform dependent. You will have different JVM for different operating systems and other environments.
The execution engine must change the bytecode to the language that can be executed by the machine in the JVM. This includes various tasks such as finding performance bottlenecks and recompiling (to native code) frequently used sections of code. The bytecode can be changed to the suitable language in one of two ways,
Interpreter : Reads, interprets and executes the bytecode instructions one by one
JIT (Just-In-Time) compiler : The JIT compiler has been introduced to compensate for the disadvantages of the interpreter. The execution engine runs as an interpreter first, and at the appropriate time, the JIT compiler compiles the entire bytecode to change it to native code. After that, the execution engine no longer interprets the method, but directly executes using native code. Execution in native code is much faster than interpreting instructions one by one. The compiled code can be executed quickly since the native code is stored in the cache.
So in a summary Java codes will get compiled into a bytecode which is platform independent and Java has a virtual machine (JVM) specific to each different platforms (Operation systems and etc) which can load and interpret those bytecodes to the machine specific code.
Refer :
https://www.cubrid.org/blog/understanding-jvm-internals/
https://docs.oracle.com/javase/tutorial/getStarted/intro/definition.html

What interprets Java's byte code

I was wondering if Java get's assembled and in my readings I found the compiler creates byte code which is then run on the Java Virtual Machine. Does the JVM interpret the byte code and execute it?
This is why I'm confused. Today in class the prof said "the compiler takes a high level language, creates assembly language, then the assembler takes the assembly language and creates machine language (binary) which can be run". So if Java compiles to bytecode how can it be run?
There is a standard compiler setup, such as would be used for the C language, and then there is Java, which is significantly different.
The standard C compiler compiles (through several internal phases) into "machine instructions" which are directly understood by the x86 processor or whatever.
The Java compiler, on the other hand, compiles to what are sometimes called "bytecodes". These are machine instructions, but for an imaginary machine, the Java Virtual Machine. So the JVM interprets the bytecodes just like a "real" machine processes it's machine instructions. (The main advantage of this is that a program compiled into bytecodes will run on any JVM, whether it be on an x86 system, an IBM RISC box, or the ARM processor in a Android -- so long as there's a JVM the code will run.)
(There have historically been a number of "virtual machines" similar to Java, the UCSD Pascal "P-code" system being one of the more successful ones.)
But it gets more complicated --
Interpreting "bytecodes" is fairly slow and inefficient, so most Java implementations have some sort of scheme to translate the bytecodes into "real" machine instructions. In some cases this is done statically, in a separate compile step, but most often it's done with a "just-in-time compiler" (JITC) which converts small portions of the bytecodes to machine instructions while the application is running. These get to be quite elaborate, with complex schemes to decide which segments of code will benefit most from translating into hardware machine instructions. But they all, for the most part, do their magic without you needing to be aware of what's going on, and without you having to compile your Java code to target a specific type of processor.
Think of bytecode as the machine langauge of the JVM. (Compilers don't HAVE to produce assembly code which has to be assembled, but they're a lot easier to write that way.)
Just a clarifying note:
That which in java is called "bytecode" is what in your original description is "creates machine language (binary) which can be run"
So the answer to how to run java bytecode is:
You build a processor which can handle java bytecode, in the same way that if you want to execute normal x86 code you build a cpu to handle that.
Javas binary machine language is not really different from the binary instruction format of other cpus such as x86 or powerpc. And there do exists cpus which can execute java bytecode directly. (That would be a normal Intel/Amd cpu).
An other example: How would you run powerpc code, on a normal intel cpu? You would build a piece of software which would at runtime translate the powerpc binary code, to x86 code. The case for java is not really that different. So to run java code on a x86 cpu, you need a program which translate the java binary code(aka the bytecode) to x86 binary code. This is what the jvm* does. And it does this either by interpreting the java instructions one at a time, or by translating a huge chunk of instructions at a time(Called jit). Exactly how the jvm handles the translation depend on which jvm implementation you use and its settings.(There are multiple independent implementations of java jvms which implement their translation in different ways).
But there is one thing which make java a bit different. Unlike other binary instruction formats such as x86, java was newer really designed to run directly on a cpu. Its binary format is designed in a way which make it easy to translate it to binary code for "normal" cpus such as x86 or powerpc.
*The jvm does in fact handle more then just translating the java binary code to processor dependend code. It also handles memory allocations for java programs, and it handles communication between a java program, and the users operation system. This is done to make the java program relative independent of the users operation system and platform details.
In a short explanation: The JVM translates the Java Byte Code into machine specific code. The generated machine specific code is then executed by the machine.
The Java compiler translates JAVA into ByteCode. The JVM translates ByteCode into Assembly (machine specific code) at runtime. The machine executes the Assembly.

Java compiler/interpreter

Why we do we say that Java is a compiled and interpreted language?
What is the advantage of this (being compiled and interpreted)?
Java is compiled to an intermediate "byte code" at compilation time. This is in contrast to a language like C that is compiled to machine language at compilation time. The Java byte code cannot be directly executed on hardware the way that compiled C code can. Instead the byte code must be interpreted by the JVM (Java Virtual Machine) at runtime in order to be executed. The primary drawback of a language like C is that when it is compiled, that binary file will only work on one particular architecture (e.g. x86).
Interpreted languages like PHP are effectively system independent and rely on a system and architecture specific interpreter. This leads to much greater portability (the same PHP scripts work on Windows machines and Linux machines, etc.). However, this interpretation leads to a significant performance decrease. High-level languages like PHP require more time to interpret than machine-specific instructions that can be executed by the hardware.
Java seeks to find a compromise between a purely compiled language (with no portability) and a purely interpreted language (that is significantly slower). It accomplishes this by compiling the code into a form that is closer to machine language (actually, Java byte code is a machine language, simply for the Java Virtual Machine), but can still be easily transported between architectures. Because Java still requires a software layer for execution (the JVM) it is an interpreted language. However, the interpreter (the JVM) operates on an intermediate form known as byte code rather than on the raw source files. This byte code is generated at compile time by the Java compiler. Therefore, Java is also a compiled language. By operating this way, Java gets some of the benefits of compiled languages, while also getting some of the benefits of interpreted languages. However, it also inherits some limitations from both of these languages.
As Bozho points out, there are some strategies for increasing the performance of Java code (and other byte code languages like .Net) through the use of Just in Time (JIT) compilation. The actual process varies from implementation to implementation based on the requirements, but the end-result is that the original code is compiled into byte code at compile time, but then it is run through a compiler at runtime before it is executed. By doing this, the code can be executed at near-native speeds. Some platforms (I believe .Net does this) save the result of the JIT compilation, replacing the byte code. By doing this, all future executions of the program will execute as though the program was natively compiled from the beginning.
Why do we say Java is compiled and interpreted language.
Because source code (.java files) is compiled into bytecode (.class files) that is then interpreted by a Java Virtual Machine (also known as a JVM) for execution (the JVM can do further optimization but this is anoher story).
What is the advantage over this(being compiled/interpreted)
Portability. The same bytecode can be executed on any platform as long as a JVM is installed ("compile once, run anywhere").
This is a long topic and you'd better read about JIT. In short, Java is compiled to bytecode, and the bytecode is later compiled (in the JVM) to machine code.
Java is considered a "compiled" language because code is compiled into bytecode format that is then run by the Java Virtual Machine (JVM). This gives several advantages in the realm of performance and code optimization, not to mention ensuring code correctness.
It is considered an "interpreted" language because, after the bytecode is compiled, it is runnable on any machine that has a JVM installed. It is in this way that Java is much like an interpreted language in that, for the most part, it doesn't depend on the platform on which is is being run. This behavior is similar to other interpreted languages such as Perl, Python, PHP, etc.
One theoretical downside to the fact that Java programs can be run on any system in absence of the source code is that, while this method of distribution ensures cross-platform compatibility, the developers have one less reason to release their source code, driving a wedge between the ideological meanings of the phrases "cross-platform" and "open source".
Java is compiled, into byte code not binaries. The byte codes are not executable directly, they need the Java Virtual Machine to do a just in time compile and compile them again into machine code at runtime.
At a very basic level, it separate the code that programmers write from the local machine where the JVM operates on, hence better portability. While compiling to bytecode helps the performance of just in time compile, reduce file sizes, and more or less help conceal real code. (it also eliminates some compile time error)
Compiled: Your program is syntactically a correct Java program, before the program starts.
Interpreted: Run on different platforms the same (byte-)code.
Compiled: When your program has compiled correctly you can be shure to have 80% of software bugs under control. And your code will not stop because you have not correctly closed a code block, etc.
Interpreted: You know what Applets are ? It was the "killer" application when Java came out. Your browser downloads the applet from the website and run the applet code in your browser. That is not very cool. But, the same applet runs on Windows, Linux, Macs, Solaris, ... because runs/interpreted an intermedium language: the byte code.

Categories

Resources