In Java, does "binary code" means the same as "Java bytecode?"
Is this the flow in Java ?
Java File (.java) -> [javac] ->
ByteCode File (.class) -> [JVM/Java
Interpreter] -> Running it(by first
converting it into binary code
specific to the machine)
Thanks!
The answer depends on what you mean by binary code.
Java bytecode is a binary data format that includes loading information and execution instructions for the Java virtual machine. In that sense, Java bytecode is a special kind of binary code.
When you use the term "binary code" to mean machine instructions for a real processors architecture (like IA-32 or Sparc) then it is different.
Java bytecode is not a binary code in that sense. It is not processor-specific.
JVM is very complex program, and the flow there is in certain level unpredictable. E.g. flow inside HotSpot JVM is something like the following:
1) it takes your bytecode and interprets it
2) if some method is executed quite frequently (some amount of times during some time span) it is marked as a "hot" method and JVM schedules its compiling to platform depended machine code (is that what you have called binary code?). That flow looks like the following:
ByteCode
--> Hige-level Intermediate Representation (HIR)
--> Middle-level Intermediate Representation (MIR)
--> Low-level Intermediate Representation (LIR)
--> Register Allocation
--> EMIT (platform dependent machine code)
Each step in that flow is important and helps JVM perform some optimizations of your code. It does not change your algorithm of course, optimization just means that some sequences of code can be detected and exchanged with better performing code (producing the same result). Starting from LIR stage, code becomes platform dependent (!).
Bytecode can be good for interpretation, but not good enough to be easily transformed into the machine native code. HIR takes care of it and its purpose is to quickly transform bytecode into an intermediate representation. MIR transforms all operations into the three-operands operation; ByteCode is based on stack operation:
iload_0
iload_1
iand
that was bytecode for simple and operation, and middle level representation for this will be sort of the following:
and v0 v1 -> v2
LIR depends on platform, taking into account our simple example with and operation, and specifying our platform as x86, then our code snippet will be:
x86_and v1 v0 -> v1
x86_move v1 -> v2
because and operation takes two operands, first one is destination, another one is source, and then we put the result value to another "variable". Next stage is "register allocation", because x86 platform (and probably most others) work with registers, and not variables (like intermediate representation), nor stack (like bytecode). Here our code snippet should be like the following:
x86_and eax ecx -> eax
and here you can notice absence of a "move" operation. Our code contained only one line and JVM figured out that creating a new virtual variable was not neede; we can just reuse the eax register. If code is big enough, having many variables and working with them intensive (e.g. using eax somewhere below, so we can't change its value), then you will see move operation left in machine code. That's again about optimization :)
That was JIT flow, but depending on VM implementation there can be one more step - if code was compiled (being "hot"), and still executed many many times, JVM schedules optimization of that code (e.g. using inlining).
Well, conclusion is that the path from bytecode to machine code is quite interesting, a bit unforeseeable, and depends on many many things.
btw, the described above process is called "Mixed mode interpretation" (when JVM first interprets bytecode, and then uses JIT compilation), example of such JVM is HotSpot. Some JVMs (like JRockit from Oracle) use JIT compilation only.
This was a very simple description of what is going on there. I hope that it helps to understand the flow inside JVM on a very high level, as well as targets the question about differences between bytecode and binary code. For references, and other issues not mentioned here and related to that topic, please read the similar topic "Why are compiled Java class files smaller than C compiled files?".
Also feel free to critique this answer, point me to mistakes or misunderstanding of mine, I'm always willing to improve my knowledge about JVM :)
There's no such thing as "machine-independent-bytecode" (it wouldn't make any sense if you think about it). Bytecode is only (for the purposes of this answer) used for things like virtual machines. VMs (such as the JVM) INTERPRET the bytecode and use some clever and complicated just-in-time compilation (which IS machine/platform-dependent) to give you the final product.
So in a sense, both of the answers are right and wrong. The Java compiler compiles code into Java bytecode (machine-independent). The *.class files the bytecode is located in are binary - they are executable, after all. The Virtual machine later interprets these binary *.class files (note: when describing files as binary, it's somewhat of a misnomer) and does various awesome stuff. More often than not, the JVM uses something called JIT (just-in-time compilation), which generates either platform-specific, or machine-specific instructions that speed up various parts of execution. JIT is another topic for another day, however.
Edit:
Java File (.java) -> [javac.exe] -> ByteCode File (.class) -> [JVM/Java Interpreter] -> Running it(by first converting it into binary code specific to the machine)
This is incorrect. The JVM doesn't "convert" anything. It simply interprets the bytecode. The only part of the JVM that "converts" bytecode is when the JIT compiler is invoked, which is a special case and should not be generalized.
Both C/C++ (to take as an example) and Java programs are compiled into Binary Code. This generic term just means that the new created file does not encode the instructions in a human-readable way. (i.e. You won't be able to open the compiled file in a text program and read it).
On the other hand, what the Binary 0's and 1's encode (or represent), depends on what the compiler generated. In the case of Java, it generates instructions called Bytecode, which are interpreted by the JVM. In other cases, for other languages, it may generate IA-32 or SPARC instructions.
In conclusion, the way the terms Binary code and Java bytecode are opposed to each other is misleading. The reason was to make the distinction between the normal binary code which is machine dependant, and the Java bytecode (also a binary code) which is not.
Answer i found today for above question:
Source: JLS
Loading refers to the process of finding the binary form of a class or interface type with a particular name, perhaps by computing it on the fly, but more typically by retrieving a binary representation previously computed from source code by a Java compiler, and constructing, from that binary form, a Class object to represent the class or interface.
The precise semantics of loading are given in Chapter 5 of The Java Virtual Machine Specification, Java SE 7 Edition. Here we present an overview of the process from the viewpoint of the Java programming language.
The binary format of a class or interface is normally the class file format described in The Java Virtual Machine Specification, Java SE 7 Edition cited above, but other formats are possible, provided they meet the requirements specified in §13.1. The method defineClass of class ClassLoader may be used to construct Class objects from binary representations in the class file format.
When the talk is going about programs then term binary code usually denote executable program in binary form (encoded as a sequence of bits).
In other words binary code is any compiled program as opposed to scripts which are distributed and executed (interpreted) in the form of text.
Binary code can be of two kinds: machine code and bytecode. Machine code is a program encoded in accordance to specification of real hardware microprocessor. Thus, it can be executed directly by target microprocessor without mediation of any other software. In contrast, bytecode is a program encoded in accordance to specification of some virtual microprocessor (virtual machine). Hence, for execution it can be either interpreted or translated into machine code and then directly executed.
In this way, every bytecode is a binary code, but not every binary code is bytecode. In the context of your question, "Java bytecode" is unconditionally a "binary code", but "binary code" will not necessary be "Java bytecode", but can be "Java bytecode".
Related
I have been figuring out the exact working of an interpreter, have googled around and have come up with some conclusion, just wanted it to be rectified by someone who can give me a better understanding of the working of interpreter.
So what i have understood is:
An interpreter is a software program that converts code from high level
language to machine format.
speaking specifically about java interpreter, it gets code in binary format
(which is earlier translated by java compiler from source code to bytecode).
now platform for a java interpreter is the JVM, in which it runs, so
basically it is going to produce code which can be run by JVM.
so it takes the bytecode produces intermediate code and the target machine
code and gives it to JVM.
JVM in turns executes that code on the OS platform in which JVM is
implemented or being run.
Now i am still not clear with the sub process that happens in between i.e.
interpreter produces intermediate code.
interpreted code is then optimized.
then target code is generated
and finally executed.
Some more questions:
so is the interpreter alone responsible for generating target code ? and
executing it ?
and does executing means it gets executed in JVM or in the underlying OS ?
An interpreter is a software program that converts code from high level language to machine format.
No. That's a compiler. An interpreter is a computer program that executes the instructions written in a language directly. This is different from a compiler that converts a higher level language into a lower language. The C compiler goes from C to assembly code with the assembler (another type of compiler) translates from assembly to machine code -- modern C compilers do both steps to go from C to machine code.
In Java, the java compiler does code verification and converts from Java source to byte-code class files. It also does a number of small processing tasks such as pre-calculation of constants (if possible), caching of strings, etc..
now platform for a java interpreter is the JVM, in which it runs, so basically it is going to produce code which can be run by JVM.
The JVM operates on the bytecode directly. The java interpreter is integrated so closely with the JVM that they shouldn't really be thought of as separate entities. What also is happening is a crap-ton of optimization where bytecode is basically optimized on the fly. This makes calling it just an interpreter inadequate. See below.
so it takes the bytecode produces intermediate code and the target machine code and gives it to JVM.
The JVM is doing these translations.
JVM in turns executes that code on the OS platform in which JVM is implemented or being run.
I'd rather say that the JVM uses the bytecode, optimized user code, the java libraries which include java and native code, in conjunction with OS calls to execute java applications.
now i am still not clear with the sub process that happens in between i.e. 1. interpreter produces intermediate code. 2. interpreted code is then optimized. 3. then target code is generated 4. and finally executed.
The Java compiler generates bytecode. When the JVM executes the code, steps 2-4 happen at runtime inside of the JVM. It is very different than C (for example) which has these separate steps being run by different utilities. Don't think about this as "subprocesses", think about it as modules inside of the JVM.
so is the interpreter alone responsible for generating target code ? and executing it ?
Sort of. The JVM's interpreter by definition reads the bytecode and executes it directly. However, in modern JVMs, the interpreter works in tandem with the Just-In-Time compiler (JIT) to generate native code on the fly so that the JVM can have your code execute more efficiently.
In addition, there are post-processing "compilation" stages which analyze the generated code at runtime so that native code can be optimized by inlining often-used code blocks and through other mechanisms. This is the reason why the JVM load spikes so high on startup. Not only is it loading in the jars and class files, but it is in effect doing a cc -O3 on the fly.
and does executing means it gets executed in JVM or in the underlying OS ?
Although we talk about the JVM executing the code, this is not technically correct. As soon as the byte-code is translated into native code, the execution of the JVM and your java application is done by the CPU and the rest of the hardware architecture.
The Operating System is the substrate that that does all of the process and resource management so the programs can efficiently share the hardware and execute efficiently. The OS also provides the APIs for applications to easily access the disk, network, memory, and other hardware and resources.
1) An interpreter is a software program that converts code from high level language to machine format.
Incorrect. An interpreter is a program that runs a program expressed in some language that is NOT the computer's native machine code.
There may be a step in this process in which the source language is parsed and translated to an intermediate language, but this is not a fundamental requirement for an interpreter. In the Java case, the bytecode language has been designed so that neither parsing or a distinct intermediate language are necessary.
2) speaking specifically about java interpreter, it gets code in binary format (which is earlier translated by java compiler from source code to bytecode).
Correct. The "binary format" is Java bytecodes.
3) now platform for a java interpreter is the JVM, in which it runs, so basically it is going to produce code which can be run by JVM.
Incorrect. The bytecode interpreter is part of the JVM. The interpreter doesn't run on the JVM. And the bytecode interpreter doesn't produce anything. It just runs the bytecodes.
4) so it takes the bytecode produces intermediate code and the target machine code and gives it to JVM.
Incorrect.
5) JVM in turns executes that code on the OS platform in which JVM is implemented or being run.
Incorrect.
The real story is this:
The JVM has a number of components to it.
One component is the bytecode interpreter. It executes bytecodes pretty much directly1. You can think of the interpreter as an emulator for an abstract computer whose instruction set is bytecodes.
A second component is the JIT compiler. This translates bytecodes into the target machine's native machine code so that it can be executed by the target hardware.
1 - A typical bytecode interpreter does some work to map abstract stack frames and object layouts to concrete ones involving target-specific sizes and offsets. But to call this an "intermediate code" is a stretch. The interpreter is really just enhancing the bytecodes.
Giving a 1000 foot view which will hopefully clear things up:
There are 2 main steps to a java application: compilation, and runtime. Each process has very different functions and purposes. The main processes for both are outlined below:
Compilation
This is (normally) executed by [com.sun.tools.javac][1] usually found in the tools.jar file, traditionally in your $JAVA_HOME - the same place as java.jar, etc.
The goal here is to translate .java source files into .class files which contain the "recipe" for the java runtime environment.
Compilation steps:
Parsing: the files are read, and stripped of their 'boundary' syntax characters, such as curly braces, semicolons, and parentheses. These exists to tell the parser which java object to translate each source component into (more about this in the next point).
AST creation: The Abstract Syntax Tree is how a source file is represented. This is a literal "tree" data structure, and the root class for this is [com.sun.tools.JCTree][3]. The overall idea is that there is a java object for each Expression and each Statement. At this point in time relatively little is known about actual "types" that each represent. The only thing that is checked for at the creation of the AST is literal syntax
Desugar: This is where for loops and other syntactical sugar are translated into simpler form. The language is still in 'tree' form and not bytecode so this can easily happen
Type checking/Inference: Where the compiler gets complex. Java is a static language, so the compiler has to go over the AST using the Visitor Pattern and figure out the types of everything ahead of tim and makes sure that at runtime everything (well, almost) will be legal as far as types, method signatures, etc. goes. If something is too vague or invalid, compilation fails.
Bytecode: Control flow is checked to make sure that the program execution logic is valid (no unreachable statements, etc.) If everything passes the checks without errors, then the AST is translated into the bytecodes that the program represents.
.class file writing: at this point, the class files are written. Essentially, the bytecode is a small layer of abstraction on top of specialized machine code. This makes it possible to port to other machines/CPU structures/platforms without having to worry about the relatively small differences between them.
Runtime
There is a different Runtime Environment/Virtual Machine implementation for each computer platform. The Java APIs are universal, but the runtime environment is an entirely separate piece of software.
JRE only knows how to translate bytecode from the class files into machine code compatible with the target platform, and that is also highly optimized for the respective platform.
There are many different runtime/vm implementations, but the most popular one is the Hotspot VM.
The VM is incredibly complex and optimizes your code at runtime. Startup times are slow but it essentially "learns" as it goes.
This is the 'JIT' (Just-in-time) concept in action - the compiler did all of the heavy lifting by checking for correct types and syntax, and the VM simply translates and optimizes the bytecode to machine code as it goes.
Also...
The Java compiler API was standardized under JSR 199. While not exactly falling under same thing (can't find the exact JLS), many other languages and tools leverage the standardized compilation process/API in order to use the advanced JVM (runtime) technology that Oracle provides, while allowing for different syntax.
See Scala, Groovy, Kotlin, Jython, JRuby, etc. All of these leverage the Java Runtime Environment because they translate their different syntax to be compatible with the Java compiler API! It's pretty neat - anyone can write a high-performance language with whatever syntax they want because of the decoupling of the two. There's adaptations for almost every single language for the JVM
I'll answer based on my experience on creating a DSL.
C is compiled because you run pass the source code to the gcc and runs the stored program in machine code.
Python is interpreted because you run programs by passing the program source to the interpreter. The interpreter reads the source file and executes it.
Java is a mix of both because you "compile" the Java file to bytecode, then invokes the JVM to run it. Bytecode isn't machine code, it needs to be interpreted by the JVM. Java is in a tier between C and Python because you cannot do fancy things like a "eval" (evaluating chunks of code or expressions at runtime as in Python). However, Java has reflection abilities that are impossible to a C program to have. In short, the design of Java runtime being in a intermediary tier between a pure compiled and a interpreted language gives the best (and the worst) of the two words in terms of performance and flexibility.
However, Python also has a virtual machine and it's own bytecode format. The same applies to Perl, Lua, etc. Those interpreters first converts a source file to a bytecode, then they interpret the bytecode.
I always wondered why doing this is necessary, until I made my own interpreter for a simulation DSL. My interpreter does a lexical analysis (break a source in tokens), converts it to a abstract syntax tree, then it evaluates the tree by traversing it. For software engineering sake I'm using some design patterns and my code heavily uses polymorphism. This is very slow in comparison to processing a efficient bytecode format that mimics a real computer architecture. My simulations would be way faster if I create my own virtual machine or use a existent one. For evaluating a long numeric expression, for instance, it'll be faster to translate it to something similar to assembly code than processing a branch of a abstract tree, since it requires calling a lot of polymorphic methods.
There are two ways of executing a program.
By way of a compiler: this parses a text in the programming language (say .c) to machine code, on Windows .exe. This can then be executed independent of the compiler.
This compilation can be done by compiling several .c files to several object files (intermediate products), and then linking them into a single application or library.
By way of an interpreter: this parses a text in the programming language (say .java) and "immediately" executes the program.
With java the approach is a bit hybrid/stacked: the java compiler javac compiles .java to .class files, and possibles zips those in .jar (or .war, .ear ...). The .class files consist of a more abstract byte code, for an abstract stack machine.
Then the java runtime java (call JVM, java virtual machine, or byte code interpreter) can execute a .class/.jar. This is in fact an interpreter of java byte code.
Nowadays it also translates (parts) of the byte code at run time to machine code. This is also called a just-in-time compiler for byte code to machine code.
In short:
- a compiler just creates code;
- an interpreter immediately executes.
An interpreter will loop over parsed commands / a high level intermediate code, and interprete every command with a piece of code. Indirect an in principle slow.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am new to java and bit confused about the role of compiler and JVM . I have read couple of resources and to name few
What-are-the-functions-of-Java-Compiler ?
Is the JVM a compiler or an interpreter?
Compiler
As i save the file .java file in system, computer internally saves it in bytes 0 and 1's. I understand compiler is validating whether written java program
confirms to java standard or not. If not throws the error otherwise jenerate the class file.
My question is what is need of generating .class file.
Can't
JVM interpret it directly (from bytes generated corresponding to .java file) without needing .class file? Does compiler(javac) do any kind of optimization here ?
JVM :- This question is other way around. Can't compiler generate the byte/machine code which can be interpreted by CPU directly? so why JVM is needed here ?
Is JVM required to interpret the byte code specific to platform for example windows or linux ?
The compiler generates byte code, in a Java meaning, which is a .class file, containing the byte code, which is not made of 1 or 0, no matter what OS you are running on.
And JVM interprets byte code to run it on specific OS
The main point of having these two stages and intermediate representation ("byte code" in Java) is platform-independence.
The Java program, in a source code, is platform-independent (to a degree). When you compile it into byte code, the Java compiler basically does (almost) all of the things it can do while still maintaining the platform-independence:
validates syntax
performs static type checks
translates human-readable source code into machine-readable byte code
does static optimizations
etc.
These are all things, that:
maintain platform-independence
only need to be performed once, since they don't rely on any run-time data
take (possibly long) time to do so it would be waste of time to do them again each time the code is executed
So now you have .class files with byte code, which are still platform-independent and can be distributed to different OS or even hardware platform.
And then the second step is the JVM. The JVM itself is platform-specific, and it's task is to translate/compile/interpret the byte code on the target architecture and OS. That means:
translate byte code to the instruction set of given platform, and using target OS system calls
run-time optimizations
etc.
What-are-the-functions-of-Java-Compiler ?
'Javac' is a Java compiler which produces byte-code(.class File) and that code is platform-independent.
Is the JVM a compiler or an interpreter? Ans- Interpreter
JVM is Java Virtual Machine -- Runs/ Interprets/ translates Bytecode into Native Machine Code and it internally uses JIT
JIT Compiles the given bytecode instruction sequence to machine code at runtime before executing it natively and do all the heavy optimizations.
All the above complexities exists to make java compile once run anywhere or Platform independent Language and Because of that bytecode or JAVAC Output is platform-independent but JVM executing that bytecode is Platform-dependent i.e we have different JVM for Windows and Unix.
A JVM, is an implementation of the Java Virtual Machine Specification. It interprets compiled Java binary code (called bytecode) for a computer's processor (or "hardware platform").
JVM is the target machine for byte-code instead of the underlying architecture.
The Java compiler ,'Javac' produces byte-code which is platform-independent. This byte-code is, we can say generic, ie., it does not include machine level details, which are specific to each platform.
The instructions in this byte-code cannot be directly run by the CPU.
Therefore some other 'program' is needed which can interpret the code, and give the CPU machine level instructions which it can execute. This program is the 'JVM' (Java Virtual Machine) which is platform specific.
That's why you have different JVM for Windows, Linux or Solaris.
I think first we should discuss the difference between executing an interpreted runtime vs a machinecode runtime vs a bytecode runtime.
In an interpreted runtime the (generally human readable) source code is converted to machine code by the interpreter only at the point when the code is run. Generally this confers advantages such that code is platform independent (so long as an interpreter exists for your platform) and ease of debugging (the code is right there in front of you) but at the cost of relatively slow execution speed as you have the overhead of converting the source code into machine code when you run the program.
In a compiled runtime the source code has been compiled into native machine code ahead of time by a dedicated compiler. This gives fast execution speed (because the code is already in the format expected by the processor), but means that the thing you distribute (the compiled binary) is generally tied to a given platform.
A bytecode runtime is sort of a halfway house that aims to give the advantages of both intepretation and compilation. In this case the source code is complied down into an intermediate format (byte code) ahead of time and then converted into machine code at runtime. The byte code is designed to be machine friendly rather than human friendly which means it's much faster to convert to machine code than in a traditionally interpreted language. Additionally, because the actual conversion to machine code is being done at run time, you still get all that nice platform independence.
Note that the choice of whether to intepret or compile is independent of the language used: for example there is no reason in theory why you could not have a c intepreter or compile python directly into machine code. Of course in practise most languages are generally only either compiled or interpreted.
So this brings us back to the question of what the Java compiler does- essentially it's main job is to turn all your nice human readable java files into java bytecode (class files) so that the JVM can efficiently execute them.
The JVM's main job, on the other hand, is to take those class files and turn them into machine code at execution time. Of course it also does other stuff (for example it manages your memory and it provides various standard libraries), but from the point of view of your question it's the conversion to machine code that's important!
Java bytecode is an intermediate, compact, way of representing a series of operations. The processor can't execute these directly. A processor executes machine instructions. They are the only thing that a processor can understands.
The JVM processes stream of bytecode operations and interprets them into a series of machine instructions for the processor to execute.
My question is what is need of generating .class file. Can't JVM interpret it directly (from bytes generated corresponding to .java file) without needing .class file? Does compiler (javac) do any kind of optimization here ?
The javac generate .class file which is an intermediate thing to achieve platform independence.
To see what compiler optimized, simply decompile the bytecode, for instance with javap or JD.
This question is other way around. Can't compiler generate the byte/machine code which can be interpreted by CPU directly? so why JVM is needed here ? Is JVM required to interpret the byte code specific to platform for example windows or linux ?
The designers of the Java language decided that the language and the compiled code was going to be platform independent, but since the code eventually has to run on a physical platform, they opted to put all the platform dependent code in the JVM. Hence for this reason we need JVM to execute code.
To see what the just in time compiler optimized, activate debug logging and read the disassembled machine code.
Does Java byte code include "processor instruction information"?
DO-178C Table A-6 "Testing of Outputs of Integration Process" states that the “Executable Object Code shall...”, where DO-178 defines object code as the following: “A low-level representation of the computer program not usually in a form directly usable by the target computer but in a form which includes relocation information in addition to the processor instruction information.”
Thus, I'm curious if Java bytecode would fit the DO-178C definition of "object code".
I'm not asking, as has been asked numerous times, the difference between byte code and object - I'm specifically interested in if the Java bytecode contains "processor instruction information".
Thanks a ton for your time and any feedback and insights.
According to Oracle: "JIT compilation of the byte code into native machine code has to occur before a method executes" http://docs.oracle.com/cd/E15289_01/doc.40/e15058/underst_jit.htm I guess that means the native machine processor instruction was lacking prior to this point. Based on this it seems that "no" the Java bytecode does not include the native machine processor instructions which are present in object code that comes out of a C compiler.
Moreover, Wikipedia (as much as it can be trusted) states: "Bytecode is not the machine code for any particular computer" https://en.wikipedia.org/wiki/Just-in-time_compilation Thus again, this seems to indicate that Java bytecode lacks the "processor instruction information" that is present within C object code.
This is a question of definitions over technical properties, but the answer would be yes. To begin, there are specialized processors that are designed at the gate level to parse and execute JVM bytecode (with some constraints). Even if the bytecode is not run on a physical processor but rather a JVM, the bytecode is the set of instructions for the JVM itself. However, this bytecode may later be converted to processor instructions run natively on the physical processor in use by way of JIT compilation/optimization.
Yes, the bytecode is the processor instruction information.
The platform specific instructions aren't part of the bytecode. The JVM goes through the .class file and does different things depending on what bytecode instruction it is currently looking at (it is acting as a virtual CPU, hence the terminology of virtual machine). This is of course a simplification but you can think of the JVM as a massive switch statement. Some JVMs will analyse several bytecode instructions (an entire method perhaps) to produce some platform specific machine code that is executed by the CPU directly when needed (JIT Compilation).
I was wondering if Java get's assembled and in my readings I found the compiler creates byte code which is then run on the Java Virtual Machine. Does the JVM interpret the byte code and execute it?
This is why I'm confused. Today in class the prof said "the compiler takes a high level language, creates assembly language, then the assembler takes the assembly language and creates machine language (binary) which can be run". So if Java compiles to bytecode how can it be run?
There is a standard compiler setup, such as would be used for the C language, and then there is Java, which is significantly different.
The standard C compiler compiles (through several internal phases) into "machine instructions" which are directly understood by the x86 processor or whatever.
The Java compiler, on the other hand, compiles to what are sometimes called "bytecodes". These are machine instructions, but for an imaginary machine, the Java Virtual Machine. So the JVM interprets the bytecodes just like a "real" machine processes it's machine instructions. (The main advantage of this is that a program compiled into bytecodes will run on any JVM, whether it be on an x86 system, an IBM RISC box, or the ARM processor in a Android -- so long as there's a JVM the code will run.)
(There have historically been a number of "virtual machines" similar to Java, the UCSD Pascal "P-code" system being one of the more successful ones.)
But it gets more complicated --
Interpreting "bytecodes" is fairly slow and inefficient, so most Java implementations have some sort of scheme to translate the bytecodes into "real" machine instructions. In some cases this is done statically, in a separate compile step, but most often it's done with a "just-in-time compiler" (JITC) which converts small portions of the bytecodes to machine instructions while the application is running. These get to be quite elaborate, with complex schemes to decide which segments of code will benefit most from translating into hardware machine instructions. But they all, for the most part, do their magic without you needing to be aware of what's going on, and without you having to compile your Java code to target a specific type of processor.
Think of bytecode as the machine langauge of the JVM. (Compilers don't HAVE to produce assembly code which has to be assembled, but they're a lot easier to write that way.)
Just a clarifying note:
That which in java is called "bytecode" is what in your original description is "creates machine language (binary) which can be run"
So the answer to how to run java bytecode is:
You build a processor which can handle java bytecode, in the same way that if you want to execute normal x86 code you build a cpu to handle that.
Javas binary machine language is not really different from the binary instruction format of other cpus such as x86 or powerpc. And there do exists cpus which can execute java bytecode directly. (That would be a normal Intel/Amd cpu).
An other example: How would you run powerpc code, on a normal intel cpu? You would build a piece of software which would at runtime translate the powerpc binary code, to x86 code. The case for java is not really that different. So to run java code on a x86 cpu, you need a program which translate the java binary code(aka the bytecode) to x86 binary code. This is what the jvm* does. And it does this either by interpreting the java instructions one at a time, or by translating a huge chunk of instructions at a time(Called jit). Exactly how the jvm handles the translation depend on which jvm implementation you use and its settings.(There are multiple independent implementations of java jvms which implement their translation in different ways).
But there is one thing which make java a bit different. Unlike other binary instruction formats such as x86, java was newer really designed to run directly on a cpu. Its binary format is designed in a way which make it easy to translate it to binary code for "normal" cpus such as x86 or powerpc.
*The jvm does in fact handle more then just translating the java binary code to processor dependend code. It also handles memory allocations for java programs, and it handles communication between a java program, and the users operation system. This is done to make the java program relative independent of the users operation system and platform details.
In a short explanation: The JVM translates the Java Byte Code into machine specific code. The generated machine specific code is then executed by the machine.
The Java compiler translates JAVA into ByteCode. The JVM translates ByteCode into Assembly (machine specific code) at runtime. The machine executes the Assembly.
I couldn't find the difference between JIT and Interpreters.
Jit is intermediary to Interpreters and Compilers. During runtime, it converts byte code to machine code ( JVM or Actual Machine ?) For the next time, it takes from the cache and runs
Am I right?
Interpreters will directly execute bytecode without transforming it into machine code. Is that right?
How the real processor in our pc will understand the instruction.?
Please clear my doubts.
First thing first:
With JVM, both interpreter and compiler (the JVM compiler and not the source-code compiler like javac) produce native code (aka Machine language code for the underlying physical CPU like x86) from byte code.
What's the difference then:
The difference is in how they generate the native code, how optimized it is as well how costly the optimization is. Informally, an interpreter pretty much converts each byte-code instruction to corresponding native instruction by looking up a predefined JVM-instruction to machine instruction mapping (see below pic). Interestingly, a further speedup in execution can be achieved, if we take a section of byte-code and convert it into machine code - because considering a whole logical section often provides rooms for optimization as opposed to converting (interpreting) each line in isolation (to machine instruction). This very act of converting a section of byte-code into (presumably optimized) machine instruction is called compiling (in the current context). When the compilation is done at run-time, the compiler is called JIT compiler.
The co-relation and co-ordination:
Since Java designer went for (hardware & OS) portability, they had chosen interpreter architecture (as opposed to c style compiling, assembling, and linking). However, in order to achieve more speed up, a compiler is also optionally added to a JVM. Nonetheless, as a program goes on being interpreted (and executed in physical CPU) "hotspot"s are detected by JVM and statistics are generated. Consequently, using statistics from interpreter, those sections become candidate for compilation (optimized native code). It is in fact done on-the-fly (thus JIT compiler) and the compiled machine instructions are used subsequently (rather than being interpreted). In a natural way, JVM also caches such compiled pieces of code.
Words of caution:
These are pretty much the fundamental concepts. If an actual implementer of JVM, does it a bit different way, don't get surprised. So could be the case for VM's in other languages.
Words of caution:
Statements like "interpreter executes byte code in virtual processor", "interpreter executes byte code directly", etc. are all correct as long as you understand that in the end there is a set of machine instructions that have to run in a physical hardware.
Some Good References: [I've not done extensive search though]
[paper] Instruction Folding in a Hardware-Translation Based Java Virtual
Machine by Hitoshi Oi
[book] Computer organization and design, 4th ed, D. A. Patterson. (see Fig 2.23)
[web-article] JVM performance optimization, Part 2: Compilers, by Eva Andreasson (JavaWorld)
PS: I've used following terms interchangebly - 'native code', 'machine language code', 'machine instructions', etc.
Interpreter: Reads your source code or some intermediate representation (bytecode) of it, and executes it directly.
JIT compiler: Reads your source code, or more typically some intermediate representation (bytecode) of it, compiles that on the fly and executes native code.
Jit is intermediary to Interpreters and Compilers. During runtime, it converts byte code to machine code ( JVM or Actual Machine ?) For the next time, it takes from the cache and runs Am i right?
Yes you are.
Interpreters will directly execute bytecode without transforming it into machine code. Is that right?
Yes, it is.
How the real processor in our pc will understand the instruction.?
In the case of interpreters, the virtual machine executes a native JVM procedure corresponding to each instruction in byte code to produce the expected behaviour. But your code isn't actually compiled to native code, as with Jit compilers. The JVM emulates the expected behaviour for each instruction.
A JIT Compiler translates byte code into machine code and then execute the machine code.
Interpreters read your high level language (interprets it) and execute what's asked by your program. Interpreters are normally not passing through byte-code and jit compilation.
But the two worlds have melt because numerous interpreters have take the path to internal byte-compilation and jit-compilation, for a better speed of execution.
Interpreter: Interprets the bytecode if a method is called multiple times every time a new interpretation is required.
JIT: when a code is called multiple time JIT converts the bytecode in native code and executes it
I'm pretty sure that JIT turns byte code into machine code for whatever machine you're running on right as it's needed. The alternative to this is to run the byte code in a java virtual machine. I'm not sure if this the same as interpreting the code since I'm more familiar with that term being used to describe the execution of a scripting (non-compiled) language like ruby or perl.
The first time a class is referenced in JVM the JIT Execution Engine re-compiles the .class files (primary Binaries) generated by Java Compiler containing JVM Instruction Set to Binaries containing HOST system’s Instruction Set. JIT stores and reuses those recompiled binaries from Memory going forward, there by reducing interpretation time and benefits from Native code execution.
And there is another flavor which does Adaptive Optimization by identifying most reused part of the app and applying JIT only over it, there by optimizing over memory usage.
On the other hand a plain old java interpreter interprets one JVM instruction from class file at a time and calls a procedure against it.
Find a detail comparison here