Technial confusion between compilation and interpretation - java

I've read many definitions and statements about "Interpretation" and "compilation". But I am still very much confused.
Technically speaking, what is REALLY the difference between interpretation and compilation under the hood? Let me elaborate (please correct any wrong concept I might have) :
In java, the source code is "compiled" into ByteCode which is then "interpreted" and/or "just-in-time compiled" into machine code. But what is the difference between just in time compilation and interpretation? I mean, in the end, as far as my guess goes, the Host's CPU will run machine code only. Thus, in interpretation as well, instructions ARE being converted into machine code which can be understood by the CPU. So, where do we draw the line between just-in-time compilation and interpretation?
P.S. This is my conception. It might be totally wrong. In that case, Kindly excuse my stupidity and correct me.
Thanks.

1. Frankly speaking the idea that java has both Compiler and Interpreter is a myth, its the behavior of it that is marked as Compilation and Interpreter.
2. Java compiler compiles the human readable code to byte code. Which then is converted by the JIT (Just In Time Compiler) during runtime into machine level executable code.
3. During Runtime JIT identifies the runtime intensive part of the code and then converts it into machine level executable code, this part of the code is known as Hot-Spot, and thats why JIT is called as Hot-Spot compiler.
4. JIT uses the Virtual Memory Table ( V-table), which is a pointer to the method in the class. The Hot-Spot code is then converted to its machine level executable code, its address is stored here, and when this part is called again, then its directly fetched by this stored address. This behavior of JIT to keep compiling small amount of code during Run time is assumed to be Interpreted Behavior, And the JIT behaviour of storing this for later use is assumed as Compilation.
5. Virtual Memory Table also has a table which stores the address of the byte code, which can be used if needed.

When the code is compiled, the generated artifact is understandable directly by the hardware. Basically it's a machine code sent directly to the CPU. This also means that an artifact compiled against given CPU architecture won't run on another. The advantage is immediate startup and great performance.
In interpreted environments there is either no compilation at all or the result of such step is an intermediary code. This code is two abstract to be sent directly to processing unit. Instead a separate layer is needed (virtual machine, interpreter) that reads this artifact and executes it in some sandbox environment. The advantage of this approach is portability - intermediate code can run on any platform where native interpreter is available. Unfortunately the performance is almost always worse.
JIT in Java is a hybrid technology. First bytecode is interpreted, each bytecode instruction is executed by the interpreter. However at some point in time (and under some conditions) bytecode is translated into machine code and sent directly to CPU to improve performance. This approach brings best of both worlds - portability of intermediate code and speed of native code. Moreover, the JIT knows much more about runtime behaviour of your code (how many times given loop is called on average? Is this method really virtual?), so the machine code can be even faster then the one generated by an ordinary compiler (!)

You're right that eventually everything has to be converted to machine code. The basic difference is that in the case of an interpreter, this translation occurs every time the code runs, whereas a compiler does this translation ahead of time, after which the compiler is not required for running the program.
Just-in-time compilation is a combination of both, where the JIT compiler is still required for running the program, and the code is compiled at run-time.
Compilation takes time but it is advantageous when the same piece of code is run several times, for eg. in a loop. The Java HotSpot VM takes this approach further by initially interpreting bytecode directly and then JIT-compiling a piece of code once it has run a certain number of times.

Interpreters interpret code line by line, and decides the machine code at run time;
Compiler consume code by chunk, and decides the machine code at compile time;
JIT compiler is a hybrid approach, in which the code is generated at run time (but could be already cached to improve performance), but is consumed in chunk.

An interpreted environment involves instructions being executed immediately after parsing, where both the parsing and execution are done by the interpreter. This means that the machine you run the code on must have the interpreter in order to run the program.1
A compiler will parse the instructions into machine code and store them for later execution. Java however is bytecompiled2, which means that this process turns the instructions into ByteCode, which will then be used by the interpreter.

Related

Why doesn't JIT compiler compile bytecode in advance? [duplicate]

What does a JIT compiler specifically do as opposed to a non-JIT compiler? Can someone give a succinct and easy to understand description?
A JIT compiler runs after the program has started and compiles the code (usually bytecode or some kind of VM instructions) on the fly (or just-in-time, as it's called) into a form that's usually faster, typically the host CPU's native instruction set. A JIT has access to dynamic runtime information whereas a standard compiler doesn't and can make better optimizations like inlining functions that are used frequently.
This is in contrast to a traditional compiler that compiles all the code to machine language before the program is first run.
To paraphrase, conventional compilers build the whole program as an EXE file BEFORE the first time you run it. For newer style programs, an assembly is generated with pseudocode (p-code). Only AFTER you execute the program on the OS (e.g., by double-clicking on its icon) will the (JIT) compiler kick in and generate machine code (m-code) that the Intel-based processor or whatever will understand.
In the beginning, a compiler was responsible for turning a high-level language (defined as higher level than assembler) into object code (machine instructions), which would then be linked (by a linker) into an executable.
At one point in the evolution of languages, compilers would compile a high-level language into pseudo-code, which would then be interpreted (by an interpreter) to run your program. This eliminated the object code and executables, and allowed these languages to be portable to multiple operating systems and hardware platforms. Pascal (which compiled to P-Code) was one of the first; Java and C# are more recent examples. Eventually the term P-Code was replaced with bytecode, since most of the pseudo-operations are a byte long.
A Just-In-Time (JIT) compiler is a feature of the run-time interpreter, that instead of interpreting bytecode every time a method is invoked, will compile the bytecode into the machine code instructions of the running machine, and then invoke this object code instead. Ideally the efficiency of running object code will overcome the inefficiency of recompiling the program every time it runs.
JIT-Just in time
the word itself says when it's needed (on demand)
Typical scenario:
The source code is completely converted into machine code
JIT scenario:
The source code will be converted into assembly language like structure [for ex IL (intermediate language) for C#, ByteCode for java].
The intermediate code is converted into machine language only when the application needs that is required codes are only converted to machine code.
JIT vs Non-JIT comparison:
In JIT not all the code is converted into machine code first a part
of the code that is necessary will be converted into machine code
then if a method or functionality called is not in machine then that
will be turned into machine code... it reduces burden on the CPU.
As the machine code will be generated on run time....the JIT
compiler will produce machine code that is optimised for running
machine's CPU architecture.
JIT Examples:
In Java JIT is in JVM (Java Virtual Machine)
In C# it is in CLR (Common Language Runtime)
In Android it is in DVM (Dalvik Virtual Machine), or ART (Android RunTime) in newer versions.
As other have mentioned
JIT stands for Just-in-Time which means that code gets compiled when it is needed, not before runtime.
Just to add a point to above discussion JVM maintains a count as of how many time a function is executed. If this count exceeds a predefined limit JIT compiles the code into machine language which can directly be executed by the processor (unlike the normal case in which javac compile the code into bytecode and then java - the interpreter interprets this bytecode line by line converts it into machine code and executes).
Also next time this function is calculated same compiled code is executed again unlike normal interpretation in which the code is interpreted again line by line. This makes execution faster.
JIT compiler only compiles the byte-code to equivalent native code at first execution. Upon every successive execution, the JVM merely uses the already compiled native code to optimize performance.
Without JIT compiler, the JVM interpreter translates the byte-code line-by-line to make it appear as if a native application is being executed.
Source
JIT stands for Just-in-Time which means that code gets compiled when it is needed, not before runtime.
This is beneficial because the compiler can generate code that is optimised for your particular machine. A static compiler, like your average C compiler, will compile all of the code on to executable code on the developer's machine. Hence the compiler will perform optimisations based on some assumptions. It can compile more slowly and do more optimisations because it is not slowing execution of the program for the user.
After the byte code (which is architecture neutral) has been generated by the Java compiler, the execution will be handled by the JVM (in Java). The byte code will be loaded in to JVM by the loader and then each byte instruction is interpreted.
When we need to call a method multiple times, we need to interpret the same code many times and this may take more time than is needed. So we have the JIT (just-in-time) compilers. When the byte has been is loaded in to JVM (its run time), the whole code will be compiled rather than interpreted, thus saving time.
JIT compilers works only during run time, so we do not have any binary output.
A just in time compiler (JIT) is a piece of software which takes receives an non executable input and returns the appropriate machine code to be executed. For example:
Intermediate representation JIT Native machine code for the current CPU architecture
Java bytecode ---> machine code
Javascript (run with V8) ---> machine code
The consequence of this is that for a certain CPU architecture the appropriate JIT compiler must be installed.
Difference compiler, interpreter, and JIT
Although there can be exceptions in general when we want to transform source code into machine code we can use:
Compiler: Takes source code and returns a executable
Interpreter: Executes the program instruction by instruction. It takes an executable segment of the source code and turns that segment into machine instructions. This process is repeated until all source code is transformed into machine instructions and executed.
JIT: Many different implementations of a JIT are possible, however a JIT is usually a combination of a compiler and an interpreter. The JIT first turn intermediary data (e.g. Java bytecode) which it receives into machine language via interpretation. A JIT can often measures when a certain part of the code is executed often and the will compile this part for faster execution.
Just In Time Compiler (JIT) :
It compiles the java bytecodes into machine instructions of that specific CPU.
For example, if we have a loop statement in our java code :
while(i<10){
// ...
a=a+i;
// ...
}
The above loop code runs for 10 times if the value of i is 0.
It is not necessary to compile the bytecode for 10 times again and again as the same instruction is going to execute for 10 times. In that case, it is necessary to compile that code only once and the value can be changed for the required number of times. So, Just In Time (JIT) Compiler keeps track of such statements and methods (as said above before) and compiles such pieces of byte code into machine code for better performance.
Another similar example , is that a search for a pattern using "Regular Expression" in a list of strings/sentences.
JIT Compiler doesn't compile all the code to machine code. It compiles code that have a similar pattern at run time.
See this Oracle documentation on Understand JIT to read more.
You have code that is compliled into some IL (intermediate language). When you run your program, the computer doesn't understand this code. It only understands native code. So the JIT compiler compiles your IL into native code on the fly. It does this at the method level.
I know this is an old thread, but runtime optimization is another important part of JIT compilation that doesn't seemed to be discussed here. Basically, the JIT compiler can monitor the program as it runs to determine ways to improve execution. Then, it can make those changes on the fly - during runtime. Google JIT optimization (javaworld has a pretty good article about it.)
just-in-time (JIT) compilation, (also dynamic translation or run-time compilation), is a way of executing computer code that involves compilation during execution of a program – at run time – rather than prior to execution.
IT compilation is a combination of the two traditional approaches to translation to machine code – ahead-of-time compilation (AOT), and interpretation – and combines some advantages and drawbacks of both. JIT compilation combines the speed of compiled code with the flexibility of interpretation.
Let's consider JIT used in JVM,
For example, the HotSpot JVM JIT compilers generate dynamic optimizations. In other words, they make optimization decisions while the Java application is running and generate high-performing native machine instructions targeted for the underlying system architecture.
When a method is chosen for compilation, the JVM feeds its bytecode to the Just-In-Time compiler (JIT). The JIT needs to understand the semantics and syntax of the bytecode before it can compile the method correctly. To help the JIT compiler analyze the method, its bytecode are first reformulated in an internal representation called trace trees, which resembles machine code more closely than bytecode. Analysis and optimizations are then performed on the trees of the method. At the end, the trees are translated into native code.
A trace tree is a data structure that is used in the runtime compilation of programming code. Trace trees are used in a type of 'just in time compiler' that traces code executing during hotspots and compiles it. Refer this.
Refer :
http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html
https://en.wikipedia.org/wiki/Just-in-time_compilation
A non-JIT compiler takes source code and transforms it into machine specific byte code at compile time. A JIT compiler takes machine agnostic byte code that was generated at compile time and transforms it into machine specific byte code at run time. The JIT compiler that Java uses is what allows a single binary to run on a multitude of platforms without modification.
Jit stands for just in time compiler
jit is a program that turns java byte code into instruction that can be sent directly to the processor.
Using the java just in time compiler (really a second compiler) at the particular system platform complies the bytecode into particular system code,once the code has been re-compiled by the jit complier ,it will usually run more quickly in the computer.
The just-in-time compiler comes with the virtual machine and is used optionally. It compiles the bytecode into platform-specific executable code that is immediately executed.
20% of the byte code is used 80% of the time. The JIT compiler gets these stats and optimizes this 20% of the byte code to run faster by adding inline methods, removal of unused locks etc and also creating the bytecode specific to that machine. I am quoting from this article, I found it was handy. http://java.dzone.com/articles/just-time-compiler-jit-hotspot
Just In Time compiler also known as JIT compiler is used for
performance improvement in Java. It is enabled by default. It is
compilation done at execution time rather earlier.
Java has popularized the use of JIT compiler by including it in
JVM.
JIT refers to execution engine in few of JVM implementations, one that is faster but requires more memory,is a just-in-time compiler. In this scheme, the bytecodes of a method are compiled to native machine code the first time the method is invoked. The native machine code for the method is then cached, so it can be re-used the next time that same method is invoked.
JVM actually performs compilation steps during runtime for performance reasons. This means that Java doesn't have a clean compile-execution separation. It first does a so called static compilation from Java source code to bytecode. Then this bytecode is passed to the JVM for execution. But executing bytecode is slow so the JVM measures how often the bytecode is run and when it detects a "hotspot" of code that's run very frequently it performs dynamic compilation from bytecode to machinecode of the "hotspot" code (hotspot profiler). So effectively today Java programs are run by machinecode execution.

Besides caching instructions, is there any difference between the native code generated by an interpreter and JIT?

I fail to understand the difference between an interpreter and JIT. For instance, from this answer:
JVM is Java Virtual Machine -- Runs/ Interprets/ translates Bytecode
into Native Machine Code
JIT is Just In Time Compiler -- Compiles the given bytecode
instruction sequence to machine code at runtime before executing
it natively. It's main purpose is to do heavy optimizations in
performance.
Both produce native machine code. Then, from this other answer:
An interpreter generates and executes machine code instructions on the
fly for each instruction, regardless of whether it has previously been
executed. A JIT caches the instructions that have been previously
interpreted to machine code, and reuses those native machine code
instructions.
As I see it, an interpreter is similar to a JIT in that it also translates the bytecode into native code, and the difference is that JIT performs some optimization, like caching.
Is this right? Is there any other major difference?
I think the above definitions aren't necessarily true.
It isn't "mandatory" or "necessary" that an interpreter translates into machine code.
In essence, an interpreter interprets. It finds a loop, and then "runs" that loop. That is not the same as creating machine code that executes a loop.
This statement:
An interpreter generates and executes machine code instructions
Is false.
Simply put, an interpreter is a program that loops over the instructions of a program (be they from a virtual or real instruction set), and executes them one by one. This is done by programming out what each instruction should do and simulating that within the interpreter.
On the most simple level, you could imagine an interpreter looks something like this in general:
for(byte byteCode : program) {
if(byteCode == ADD_BYTECODE) {
add();
}
// ... others
}
This is not so different a concept from how a CPU executes machine code, but in the case of a CPU, most of the logic is implemented in hardware directly.
I suppose you could say that an interpreter is a program that simulates a CPU in software.
The JIT compiler does the job of translating byte code into machine code and optimizing it along the way too. One of the theoretical advantages of machine code over byte code is for instance that a particular CPU might have specialized instructions available that run faster than the byte code equivalent.
In the case of the JVM this is done when a method is "hot", i.e. when it is ran a lot. JIT compilation takes a long time however (try running a Java program with the -XX:-TieredCompilation -Xcomp flags, which force C2 compilation by default, you'll see the difference in startup time), so it's faster to interpret the byte code first. That also gives the opportunity to collect profiling data, which is data about how the program is running (e.g. how many times an if branch is triggered, or which types are used for dynamic dispatch calls). The profiling data is also used during JIT compilation to do better optimization.

Why is java bytecode interpreted?

As far as I understand Java compiles to Java bytecode, which can then be interpreted by any machine running Java for its specific CPU. Java uses JIT to interpret the bytecode, and I know it's gotten really fast at doing so, but why doesn't/didn't the language designers just statically compile down to machine instructions once it detects the particular machine it's running on? Is the bytecode interpreted every single pass through the code?
The original design was in the premise of "compile once run anywhere". So every implementer of the virtual machine can run the bytecodes generated by a compiler.
In the book Masterminds for Programming, James Gosling explained:
James: Exactly. These days we’re
beating the really good C and C++
compilers pretty much always. When you
go to the dynamic compiler, you get
two advantages when the compiler’s
running right at the last moment. One
is you know exactly what chipset
you’re running on. So many times when
people are compiling a piece of C
code, they have to compile it to run
on kind of the generic x86
architecture. Almost none of the
binaries you get are particularly well
tuned for any of them. You download
the latest copy of Mozilla,and it’ll
run on pretty much any Intel
architecture CPU. There’s pretty much
one Linux binary. It’s pretty generic,
and it’s compiled with GCC, which is
not a very good C compiler.
When HotSpot runs, it knows exactly
what chipset you’re running on. It
knows exactly how the cache works. It
knows exactly how the memory hierarchy
works. It knows exactly how all the
pipeline interlocks work in the CPU.
It knows what instruction set
extensions this chip has got. It
optimizes for precisely what machine
you’re on. Then the other half of it
is that it actually sees the
application as it’s running. It’s able
to have statistics that know which
things are important. It’s able to
inline things that a C compiler could
never do. The kind of stuff that gets
inlined in the Java world is pretty
amazing. Then you tack onto that the
way the storage management works with
the modern garbage collectors. With a
modern garbage collector, storage
allocation is extremely fast.
Java is commonly compiled to machine instructions; that's what just-in-time (JIT) compilation is. But Sun's Java implementation by default only does that for code that is run often enough (so startup and shutdown bytecode, that is executed only once, is still interpreted to prevent JIT overhead).
Bytecode interpretation is usually "fast enough" for a lot of cases. Compiling, on the other hand, is rather expensive. If 90% of the runtime is spent in 1% of the code it's far better to just compile that 1% and leave the other 99% alone.
Static compiling can blow up on you because all the other libraries you use also need to be write-once run everywhere (i.e. byte-code), including all of their dependencies. This can lead to a chain of compilations following dependencies that can blow up on you. Compiling only the code as (while running) the runtime discovers it actually needs that section of code compiled is the general idea I think. There may be many code paths you don't actually follow, especially when libraries come into question.
Java Bytecode is interpreted because bytecodes are portable across various platforms.JVM, which is platform dependent,converts and executes bytecodes to specific instruction set of that machine whether it may be a Windows or LINUX or MAC etc...
One important difference of dynamic compiling is that it optimises the code base don how it is run. There is an option -XX:CompileThreshold= which is 10000 by default. You can decrease this so it optimises the code sooner, but if you run a complex application or benchmark, you can find that reducing this number can result in slower code. If you run a simple benchmark, you may not find it makes any difference.
One example where dynamic compiling has an advantage over static compiling is inlining "virtual" methods, esp those which can be replaced. For example, the JVM can inline up to two heavily used "virtual" methods, which may be in a separate jar compiled after the caller was compiled. The called jar(s) can even be removed from the running system e.g. OSGi and have another jar added or replace it. The replacement JAR's methods can then be inlined. This can only be achieved with dynamic compiling.

JIT vs Interpreters

I couldn't find the difference between JIT and Interpreters.
Jit is intermediary to Interpreters and Compilers. During runtime, it converts byte code to machine code ( JVM or Actual Machine ?) For the next time, it takes from the cache and runs
Am I right?
Interpreters will directly execute bytecode without transforming it into machine code. Is that right?
How the real processor in our pc will understand the instruction.?
Please clear my doubts.
First thing first:
With JVM, both interpreter and compiler (the JVM compiler and not the source-code compiler like javac) produce native code (aka Machine language code for the underlying physical CPU like x86) from byte code.
What's the difference then:
The difference is in how they generate the native code, how optimized it is as well how costly the optimization is. Informally, an interpreter pretty much converts each byte-code instruction to corresponding native instruction by looking up a predefined JVM-instruction to machine instruction mapping (see below pic). Interestingly, a further speedup in execution can be achieved, if we take a section of byte-code and convert it into machine code - because considering a whole logical section often provides rooms for optimization as opposed to converting (interpreting) each line in isolation (to machine instruction). This very act of converting a section of byte-code into (presumably optimized) machine instruction is called compiling (in the current context). When the compilation is done at run-time, the compiler is called JIT compiler.
The co-relation and co-ordination:
Since Java designer went for (hardware & OS) portability, they had chosen interpreter architecture (as opposed to c style compiling, assembling, and linking). However, in order to achieve more speed up, a compiler is also optionally added to a JVM. Nonetheless, as a program goes on being interpreted (and executed in physical CPU) "hotspot"s are detected by JVM and statistics are generated. Consequently, using statistics from interpreter, those sections become candidate for compilation (optimized native code). It is in fact done on-the-fly (thus JIT compiler) and the compiled machine instructions are used subsequently (rather than being interpreted). In a natural way, JVM also caches such compiled pieces of code.
Words of caution:
These are pretty much the fundamental concepts. If an actual implementer of JVM, does it a bit different way, don't get surprised. So could be the case for VM's in other languages.
Words of caution:
Statements like "interpreter executes byte code in virtual processor", "interpreter executes byte code directly", etc. are all correct as long as you understand that in the end there is a set of machine instructions that have to run in a physical hardware.
Some Good References: [I've not done extensive search though]
[paper] Instruction Folding in a Hardware-Translation Based Java Virtual
Machine by Hitoshi Oi
[book] Computer organization and design, 4th ed, D. A. Patterson. (see Fig 2.23)
[web-article] JVM performance optimization, Part 2: Compilers, by Eva Andreasson (JavaWorld)
PS: I've used following terms interchangebly - 'native code', 'machine language code', 'machine instructions', etc.
Interpreter: Reads your source code or some intermediate representation (bytecode) of it, and executes it directly.
JIT compiler: Reads your source code, or more typically some intermediate representation (bytecode) of it, compiles that on the fly and executes native code.
Jit is intermediary to Interpreters and Compilers. During runtime, it converts byte code to machine code ( JVM or Actual Machine ?) For the next time, it takes from the cache and runs Am i right?
Yes you are.
Interpreters will directly execute bytecode without transforming it into machine code. Is that right?
Yes, it is.
How the real processor in our pc will understand the instruction.?
In the case of interpreters, the virtual machine executes a native JVM procedure corresponding to each instruction in byte code to produce the expected behaviour. But your code isn't actually compiled to native code, as with Jit compilers. The JVM emulates the expected behaviour for each instruction.
A JIT Compiler translates byte code into machine code and then execute the machine code.
Interpreters read your high level language (interprets it) and execute what's asked by your program. Interpreters are normally not passing through byte-code and jit compilation.
But the two worlds have melt because numerous interpreters have take the path to internal byte-compilation and jit-compilation, for a better speed of execution.
Interpreter: Interprets the bytecode if a method is called multiple times every time a new interpretation is required.
JIT: when a code is called multiple time JIT converts the bytecode in native code and executes it
I'm pretty sure that JIT turns byte code into machine code for whatever machine you're running on right as it's needed. The alternative to this is to run the byte code in a java virtual machine. I'm not sure if this the same as interpreting the code since I'm more familiar with that term being used to describe the execution of a scripting (non-compiled) language like ruby or perl.
The first time a class is referenced in JVM the JIT Execution Engine re-compiles the .class files (primary Binaries) generated by Java Compiler containing JVM Instruction Set to Binaries containing HOST system’s Instruction Set. JIT stores and reuses those recompiled binaries from Memory going forward, there by reducing interpretation time and benefits from Native code execution.
And there is another flavor which does Adaptive Optimization by identifying most reused part of the app and applying JIT only over it, there by optimizing over memory usage.
On the other hand a plain old java interpreter interprets one JVM instruction from class file at a time and calls a procedure against it.
Find a detail comparison here

Compiled vs. Interpreted Languages

I'm trying to get a better understanding of the difference. I've found a lot of explanations online, but they tend towards the abstract differences rather than the practical implications.
Most of my programming experiences has been with CPython (dynamic, interpreted), and Java (static, compiled). However, I understand that there are other kinds of interpreted and compiled languages. Aside from the fact that executable files can be distributed from programs written in compiled languages, are there any advantages/disadvantages to each type? Oftentimes, I hear people arguing that interpreted languages can be used interactively, but I believe that compiled languages can have interactive implementations as well, correct?
A compiled language is one where the program, once compiled, is expressed in the instructions of the target machine. For example, an addition "+" operation in your source code could be translated directly to the "ADD" instruction in machine code.
An interpreted language is one where the instructions are not directly executed by the target machine, but instead read and executed by some other program (which normally is written in the language of the native machine). For example, the same "+" operation would be recognised by the interpreter at run time, which would then call its own "add(a,b)" function with the appropriate arguments, which would then execute the machine code "ADD" instruction.
You can do anything that you can do in an interpreted language in a compiled language and vice-versa - they are both Turing complete. Both however have advantages and disadvantages for implementation and use.
I'm going to completely generalise (purists forgive me!) but, roughly, here are the advantages of compiled languages:
Faster performance by directly using the native code of the target machine
Opportunity to apply quite powerful optimisations during the compile stage
And here are the advantages of interpreted languages:
Easier to implement (writing good compilers is very hard!!)
No need to run a compilation stage: can execute code directly "on the fly"
Can be more convenient for dynamic languages
Note that modern techniques such as bytecode compilation add some extra complexity - what happens here is that the compiler targets a "virtual machine" which is not the same as the underlying hardware. These virtual machine instructions can then be compiled again at a later stage to get native code (e.g. as done by the Java JVM JIT compiler).
A language itself is neither compiled nor interpreted, only a specific implementation of a language is. Java is a perfect example. There is a bytecode-based platform (the JVM), a native compiler (gcj) and an interpeter for a superset of Java (bsh). So what is Java now? Bytecode-compiled, native-compiled or interpreted?
Other languages, which are compiled as well as interpreted, are Scala, Haskell or Ocaml. Each of these languages has an interactive interpreter, as well as a compiler to byte-code or native machine code.
So generally categorizing languages by "compiled" and "interpreted" doesn't make much sense.
Start thinking in terms of a: blast from the past
Once upon a time, long long ago, there lived in the land of computing
interpreters and compilers. All kinds of fuss ensued over the merits of
one over the other. The general opinion at that time was something along the lines of:
Interpreter: Fast to develop (edit and run). Slow to execute because each statement had to be interpreted into
machine code every time it was executed (think of what this meant for a loop executed thousands of times).
Compiler: Slow to develop (edit, compile, link and run. The compile/link steps could take serious time). Fast
to execute. The whole program was already in native machine code.
A one or two order of magnitude difference in the runtime
performance existed between an interpreted program and a compiled program. Other distinguishing
points, run-time mutability of the code for example, were also of some interest but the major
distinction revolved around the run-time performance issues.
Today the landscape has evolved to such an extent that the compiled/interpreted distinction is
pretty much irrelevant. Many
compiled languages call upon run-time services that are not
completely machine code based. Also, most interpreted languages are "compiled" into byte-code
before execution. Byte-code interpreters can be very efficient and rival some compiler generated
code from an execution speed point of view.
The classic difference is that compilers generated native machine code, interpreters read source code and
generated machine code on the fly using some sort of run-time system.
Today there are very few classic interpreters left - almost all of them
compile into byte-code (or some other semi-compiled state) which then runs on a virtual "machine".
The extreme and simple cases:
A compiler will produce a binary executable in the target machine's native executable format. This binary file contains all required resources except for system libraries; it's ready to run with no further preparation and processing and it runs like lightning because the code is the native code for the CPU on the target machine.
An interpreter will present the user with a prompt in a loop where he can enter statements or code, and upon hitting RUN or the equivalent the interpreter will examine, scan, parse and interpretatively execute each line until the program runs to a stopping point or an error. Because each line is treated on its own and the interpreter doesn't "learn" anything from having seen the line before, the effort of converting human-readable language to machine instructions is incurred every time for every line, so it's dog slow. On the bright side, the user can inspect and otherwise interact with his program in all kinds of ways: Changing variables, changing code, running in trace or debug modes... whatever.
With those out of the way, let me explain that life ain't so simple any more. For instance,
Many interpreters will pre-compile the code they're given so the translation step doesn't have to be repeated again and again.
Some compilers compile not to CPU-specific machine instructions but to bytecode, a kind of artificial machine code for a ficticious machine. This makes the compiled program a bit more portable, but requires a bytecode interpreter on every target system.
The bytecode interpreters (I'm looking at Java here) recently tend to re-compile the bytecode they get for the CPU of the target section just before execution (called JIT). To save time, this is often only done for code that runs often (hotspots).
Some systems that look and act like interpreters (Clojure, for instance) compile any code they get, immediately, but allow interactive access to the program's environment. That's basically the convenience of interpreters with the speed of binary compilation.
Some compilers don't really compile, they just pre-digest and compress code. I heard a while back that's how Perl works. So sometimes the compiler is just doing a bit of the work and most of it is still interpretation.
In the end, these days, interpreting vs. compiling is a trade-off, with time spent (once) compiling often being rewarded by better runtime performance, but an interpretative environment giving more opportunities for interaction. Compiling vs. interpreting is mostly a matter of how the work of "understanding" the program is divided up between different processes, and the line is a bit blurry these days as languages and products try to offer the best of both worlds.
From http://www.quora.com/What-is-the-difference-between-compiled-and-interpreted-programming-languages
There is no difference, because “compiled programming language” and
“interpreted programming language” aren’t meaningful concepts. Any
programming language, and I really mean any, can be interpreted or
compiled. Thus, interpretation and compilation are implementation
techniques, not attributes of languages.
Interpretation is a technique whereby another program, the
interpreter, performs operations on behalf of the program being
interpreted in order to run it. If you can imagine reading a program
and doing what it says to do step-by-step, say on a piece of scratch
paper, that’s just what an interpreter does as well. A common reason
to interpret a program is that interpreters are relatively easy to
write. Another reason is that an interpreter can monitor what a
program tries to do as it runs, to enforce a policy, say, for
security.
Compilation is a technique whereby a program written in one language
(the “source language”) is translated into a program in another
language (the “object language”), which hopefully means the same thing
as the original program. While doing the translation, it is common for
the compiler to also try to transform the program in ways that will
make the object program faster (without changing its meaning!). A
common reason to compile a program is that there’s some good way to
run programs in the object language quickly and without the overhead
of interpreting the source language along the way.
You may have guessed, based on the above definitions, that these two
implementation techniques are not mutually exclusive, and may even be
complementary. Traditionally, the object language of a compiler was
machine code or something similar, which refers to any number of
programming languages understood by particular computer CPUs. The
machine code would then run “on the metal” (though one might see, if
one looks closely enough, that the “metal” works a lot like an
interpreter). Today, however, it’s very common to use a compiler to
generate object code that is meant to be interpreted—for example, this
is how Java used to (and sometimes still does) work. There are
compilers that translate other languages to JavaScript, which is then
often run in a web browser, which might interpret the JavaScript, or
compile it a virtual machine or native code. We also have interpreters
for machine code, which can be used to emulate one kind of hardware on
another. Or, one might use a compiler to generate object code that is
then the source code for another compiler, which might even compile
code in memory just in time for it to run, which in turn . . . you get
the idea. There are many ways to combine these concepts.
The biggest advantage of interpreted source code over compiled source code is PORTABILITY.
If your source code is compiled, you need to compile a different executable for each type of processor and/or platform that you want your program to run on (e.g. one for Windows x86, one for Windows x64, one for Linux x64, and so on). Furthermore, unless your code is completely standards compliant and does not use any platform-specific functions/libraries, you will actually need to write and maintain multiple code bases!
If your source code is interpreted, you only need to write it once and it can be interpreted and executed by an appropriate interpreter on any platform! It's portable! Note that an interpreter itself is an executable program that is written and compiled for a specific platform.
An advantage of compiled code is that it hides the source code from the end user (which might be intellectual property) because instead of deploying the original human-readable source code, you deploy an obscure binary executable file.
A compiler and an interpreter do the same job: translating a programming language to another pgoramming language, usually closer to the hardware, often direct executable machine code.
Traditionally, "compiled" means that this translation happens all in one go, is done by a developer, and the resulting executable is distributed to users. Pure example: C++.
Compilation usually takes pretty long and tries to do lots of expensive optmization so that the resulting executable runs faster. End users don't have the tools and knowledge to compile stuff themselves, and the executable often has to run on a variety of hardware, so you can't do many hardware-specific optimizations. During development, the separate compilation step means a longer feedback cycle.
Traditionally, "interpreted" means that the translation happens "on the fly", when the user wants to run the program. Pure example: vanilla PHP. A naive interpreter has to parse and translate every piece of code every time it runs, which makes it very slow. It can't do complex, costly optimizations because they'd take longer than the time saved in execution. But it can fully use the capabilities of the hardware it runs on. The lack of a separrate compilation step reduces feedback time during development.
But nowadays "compiled vs. interpreted" is not a black-or-white issue, there are shades in between. Naive, simple interpreters are pretty much extinct. Many languages use a two-step process where the high-level code is translated to a platform-independant bytecode (which is much faster to interpret). Then you have "just in time compilers" which compile code at most once per program run, sometimes cache results, and even intelligently decide to interpret code that's run rarely, and do powerful optimizations for code that runs a lot. During development, debuggers are capable of switching code inside a running program even for traditionally compiled languages.
First, a clarification, Java is not fully static-compiled and linked in the way C++. It is compiled into bytecode, which is then interpreted by a JVM. The JVM can go and do just-in-time compilation to the native machine language, but doesn't have to do it.
More to the point: I think interactivity is the main practical difference. Since everything is interpreted, you can take a small excerpt of code, parse and run it against the current state of the environment. Thus, if you had already executed code that initialized a variable, you would have access to that variable, etc. It really lends itself way to things like the functional style.
Interpretation, however, costs a lot, especially when you have a large system with a lot of references and context. By definition, it is wasteful because identical code may have to be interpreted and optimized twice (although most runtimes have some caching and optimizations for that). Still, you pay a runtime cost and often need a runtime environment. You are also less likely to see complex interprocedural optimizations because at present their performance is not sufficiently interactive.
Therefore, for large systems that are not going to change much, and for certain languages, it makes more sense to precompile and prelink everything, do all the optimizations that you can do. This ends up with a very lean runtime that is already optimized for the target machine.
As for generating executables, that has little to do with it, IMHO. You can often create an executable from a compiled language. But you can also create an executable from an interpreted language, except that the interpreter and runtime is already packaged in the exectuable and hidden from you. This means that you generally still pay the runtime costs (although I am sure that for some language there are ways to translate everything to a tree executable).
I disagree that all languages could be made interactive. Certain languages, like C, are so tied to the machine and the entire link structure that I'm not sure you can build a meaningful fully-fledged interactive version
This is one of the biggest misunderstood things in computer science as I guess.
Because interpretation and compilation are completely two different things, which we can't compare in this way.
The compilation is the process of translating one language into another language. There are few types of compilations.
Compiling - Translate high-level language into machine/byte code (ex: C/C++/Java)
Transpiling - Translate high-level language into another high-level language (ex: TypeScript)
Interpretation is the process of actually executing the program. This may happen in few different ways.
Machine level interpretation - This interpretation happens to the code which is compiled into machine code. Instructions are directly interpreted by the processor. Programming languages like C/C++ generate machine code, which is executable by the processor. So the processor can directly execute these instructions.
Virtual machine level interpretation - This interpretation happens to the code which is not compiled into the machine level (processor support) code, but into some intermediate-level code. This execution is done by another software, which is executed by the processor. At this time actually processor doesn't see our application. It just executing the virtual machine, which is executing our application. Programming languages like Java, Python, C# generate a byte code, which is executable by the virtual interpreter/machine.
So at the end of the day what we have to understand is, all the programming languages in the world should be interpreted at some time. It may be done by a processor(hardware) or a virtual machine.
The compilation is just the process of bringing the high-level code we write that is human-understandable into some hardware/software machine-understandable level.
These are completely two different things, which we can't compare. But that terminology is pretty much good to teach beginners how programming languages work.
PS:
Some programming languages like Java have a hybrid approach to do this. First, compile the high-level code into byte code which is virtual-machine readable. And on the fly, a component called the JIT compiler compiles byte-code into machine code. Specifically, code lines that are executed again and again many times are get translated into the machine language, which makes the interpretation process much faster. Because hardware processor is always much faster than virtual interpreter/processor.
How Java JIT compiler works
It's rather difficult to give a practical answer because the difference is about the language definition itself. It's possible to build an interpreter for every compiled language, but it's not possible to build an compiler for every interpreted language. It's very much about the formal definition of a language. So that theoretical informatics stuff noboby likes at university.
The Python Book © 2015 Imagine Publishing Ltd, simply distunguishes the difference by the following hint mentioned in page 10 as:
An interpreted language such as Python is one where the source code is converted to machine code and then executed each time the program runs. This is different from a compiled language such as C, where the source code is only converted to machine code once – the resulting machine code is then executed each time the program runs.
Compile is the process of creating an executable program from code written in a compiled programming language. Compiling allows the computer to run and understand the program without the need of the programming software used to create it. When a program is compiled it is often compiled for a specific platform (e.g. IBM platform) that works with IBM compatible computers, but not other platforms (e.g. Apple platform).
The first compiler was developed by Grace Hopper while working on the Harvard Mark I computer. Today, most high-level languages will include their own compiler or have toolkits available that can be used to compile the program. A good example of a compiler used with Java is Eclipse and an example of a compiler used with C and C++ is the gcc command. Depending on how big the program is it should take a few seconds or minutes to compile and if no errors are encountered while being compiled an executable file is created.check this information
Short (un-precise) definition:
Compiled language: Entire program is translated to machine code at once, then the machine code is run by the CPU.
Interpreted language: Program is read line-by-line and as soon as a line is read the machine instructions for that line are executed by the CPU.
But really, few languages these days are purely compiled or purely interpreted, it often is a mix. For a more detailed description with pictures, see this thread:
What is the difference between compilation and interpretation?
Or my later blog post:
https://orangejuiceliberationfront.com/the-difference-between-compiler-and-interpreter/

Categories

Resources