How can a JVM be written in Java - java

I was briefly reading about Maxine which is an open source JVM implementation that written in Java. This sounds circular to me. If java requires a virtual machine to run in, how can the virtual machine itself be written in Java (won't the VM code require a VM in which to run, and so on?).
Edit: Ok, so I see I overlooked the fact that Java doesn't have to run in a VM. How then does one explain how a LISP compiler can be written in LISP? Or should this be a new question altogether?

Your assumption that Java requires a virtual machine is incorrect to begin with. Check out the project GCJ: The GNU Compiler for the Java Programming Language.

You are asking about the chicken and the egg.
Read: http://en.wikipedia.org/wiki/Bootstrapping_%28compilers%29

The JVM that you need to bootstrap a JVM written in Java probably does not need a lot of features (such as garbage collection and JIT), could be very simple. All the more advanced features could then be implemented in Java (which seems to be exactly the point of Maxine, to experiment with new ideas in JVM technology).
Also, Maxine does contain C code, which I guess makes up a minimal runtime environment that is used to get the rest of Maxine going. I take it that the interesting bits (JIT compiler, garbage collection) are then completely implemented in Java.

See bootstrapping.

Java code can be compiled directly to machine code so that a virtual machine is not needed.

I had a look at Maxine last week and was wondering the same :)
From the Maxine documentation:
1 Building the boot image
Now let's build a [boot image]. In
this step, Maxine runs on a host JVM
to configure a prototype, then
compiles its own code and data to
create an executable program for the
target platform.
2 Running Maxine
Now that Maxine has compiled itself,
we can run it as a standard Java VM.
The max vm command handles the details
of class and library paths and
provides an interface similar to the
standard java launcher command.

You can have a look at the well-established method of bootstrapping compilers. I think it started in the 70s...

It is kinda 'whooaoaa man, how can that work???' - but I think you are describing the phenomenon known as 'self-hosting':
Languages (or toolchains/platforms) don't start out as self-hosting - they start off life having been built on an existing platform: at a certain point they become functional enough to allow programs to be written which understand the syntax which it itself happens to be written in.
There is a great example in the classic AWK book, which introduces an AWK program which can parse (a cut-down version as it happens) other AWK programs: see link below.
There is another example in the book "Beautiful Code" which has a Javascript program which can parse Javascript.
I think the thing to remember on this - if you have (say) a JVM written in Java which can therefore run Java Byte code: the JVM which runs the Java JVM itself has to be hosted natively (perhaps this JVM was written in 'C' and then compiled to machine code) : this is true in any case of a self-hosting program eventually - somewhere along the line.
So the mystery is removed - because at some point, there is a native machine-code program running below everything.
It kinda of equivalent of being able to describe the English (etc) language using the English language itself....maybe...
http://www.amazon.co.uk/AWK-Programming-Language-Alfred-Aho/dp/020107981X/ref=sr_1_fkmr0_3?ie=UTF8&qid=1266397076&sr=8-3-fkmr0
http://www.amazon.co.uk/gp/search/ref=a9_sc_1?rh=i%3Astripbooks%2Ck%3Abeautiful+code&keywords=beautiful+code&ie=UTF8&qid=1266397435
http://en.wikipedia.org/wiki/Self-hosting

I know this post is old but I thought I might add a little to the discussion as they are points that have been missed. So future readers may find this helpful.
I wonder if everyone is missing the point here. You can write most any kind of compiler, interpreter, or virtual machine in almost any language. When using C to write a C compiler a C compiler is needed to compile the new compiler. However, the output is native code that runs on the designated platform. Just because the JVM is written in the language that runs on the JVM doesn't mean the output must result in code that runs on the JVM. For instance you can write C, Basic, Pascal Compilers or even assemblers in Java. In this case you will need the JVM to create the compiler or assembler but once created you may no longer need the JVM if the initial code resulted in native code. Another approach is to write a translator that takes an input language and converts it to a native machine language so that you write your program in language A which compiles into language B which which is then compiled into machine code. In the micro controller world you see this a lot. Someone wants to write programs in Basic or Java so they write the Basic/Java compiler to produce C code for an existing C compiler. Then the resultant C code is compiled into machine language providing the native Basic/Java compiler. This approach is usually easier than writing the Basic/Java compiler directly in machine code.
Many years ago I wrote BasicA and GWBasic programs that produced assembly code to 6800 and Z80 micros. My point is that the output need not be of the same ilk as the input or target. I.E. Just because you're writing a JVM in Java doesn't mean the final result must be ran under a Java JVM.

Here is a good paper on bootstraping a self-hosted VM. It's not Java, but javascript, but the principles are the same.
Bootstrapping a self-hosted research virtual machine for JavaScript: an experience report
Note that while bootstraping a self-host compiler and bootstraping a self-hosted VM are somewhat similar, I believe they do not raise the exact same challenges.

Related

How exactly does the Java interpreter or any interpreter work?

I have been figuring out the exact working of an interpreter, have googled around and have come up with some conclusion, just wanted it to be rectified by someone who can give me a better understanding of the working of interpreter.
So what i have understood is:
An interpreter is a software program that converts code from high level
language to machine format.
speaking specifically about java interpreter, it gets code in binary format
(which is earlier translated by java compiler from source code to bytecode).
now platform for a java interpreter is the JVM, in which it runs, so
basically it is going to produce code which can be run by JVM.
so it takes the bytecode produces intermediate code and the target machine
code and gives it to JVM.
JVM in turns executes that code on the OS platform in which JVM is
implemented or being run.
Now i am still not clear with the sub process that happens in between i.e.
interpreter produces intermediate code.
interpreted code is then optimized.
then target code is generated
and finally executed.
Some more questions:
so is the interpreter alone responsible for generating target code ? and
executing it ?
and does executing means it gets executed in JVM or in the underlying OS ?
An interpreter is a software program that converts code from high level language to machine format.
No. That's a compiler. An interpreter is a computer program that executes the instructions written in a language directly. This is different from a compiler that converts a higher level language into a lower language. The C compiler goes from C to assembly code with the assembler (another type of compiler) translates from assembly to machine code -- modern C compilers do both steps to go from C to machine code.
In Java, the java compiler does code verification and converts from Java source to byte-code class files. It also does a number of small processing tasks such as pre-calculation of constants (if possible), caching of strings, etc..
now platform for a java interpreter is the JVM, in which it runs, so basically it is going to produce code which can be run by JVM.
The JVM operates on the bytecode directly. The java interpreter is integrated so closely with the JVM that they shouldn't really be thought of as separate entities. What also is happening is a crap-ton of optimization where bytecode is basically optimized on the fly. This makes calling it just an interpreter inadequate. See below.
so it takes the bytecode produces intermediate code and the target machine code and gives it to JVM.
The JVM is doing these translations.
JVM in turns executes that code on the OS platform in which JVM is implemented or being run.
I'd rather say that the JVM uses the bytecode, optimized user code, the java libraries which include java and native code, in conjunction with OS calls to execute java applications.
now i am still not clear with the sub process that happens in between i.e. 1. interpreter produces intermediate code. 2. interpreted code is then optimized. 3. then target code is generated 4. and finally executed.
The Java compiler generates bytecode. When the JVM executes the code, steps 2-4 happen at runtime inside of the JVM. It is very different than C (for example) which has these separate steps being run by different utilities. Don't think about this as "subprocesses", think about it as modules inside of the JVM.
so is the interpreter alone responsible for generating target code ? and executing it ?
Sort of. The JVM's interpreter by definition reads the bytecode and executes it directly. However, in modern JVMs, the interpreter works in tandem with the Just-In-Time compiler (JIT) to generate native code on the fly so that the JVM can have your code execute more efficiently.
In addition, there are post-processing "compilation" stages which analyze the generated code at runtime so that native code can be optimized by inlining often-used code blocks and through other mechanisms. This is the reason why the JVM load spikes so high on startup. Not only is it loading in the jars and class files, but it is in effect doing a cc -O3 on the fly.
and does executing means it gets executed in JVM or in the underlying OS ?
Although we talk about the JVM executing the code, this is not technically correct. As soon as the byte-code is translated into native code, the execution of the JVM and your java application is done by the CPU and the rest of the hardware architecture.
The Operating System is the substrate that that does all of the process and resource management so the programs can efficiently share the hardware and execute efficiently. The OS also provides the APIs for applications to easily access the disk, network, memory, and other hardware and resources.
1) An interpreter is a software program that converts code from high level language to machine format.
Incorrect. An interpreter is a program that runs a program expressed in some language that is NOT the computer's native machine code.
There may be a step in this process in which the source language is parsed and translated to an intermediate language, but this is not a fundamental requirement for an interpreter. In the Java case, the bytecode language has been designed so that neither parsing or a distinct intermediate language are necessary.
2) speaking specifically about java interpreter, it gets code in binary format (which is earlier translated by java compiler from source code to bytecode).
Correct. The "binary format" is Java bytecodes.
3) now platform for a java interpreter is the JVM, in which it runs, so basically it is going to produce code which can be run by JVM.
Incorrect. The bytecode interpreter is part of the JVM. The interpreter doesn't run on the JVM. And the bytecode interpreter doesn't produce anything. It just runs the bytecodes.
4) so it takes the bytecode produces intermediate code and the target machine code and gives it to JVM.
Incorrect.
5) JVM in turns executes that code on the OS platform in which JVM is implemented or being run.
Incorrect.
The real story is this:
The JVM has a number of components to it.
One component is the bytecode interpreter. It executes bytecodes pretty much directly1. You can think of the interpreter as an emulator for an abstract computer whose instruction set is bytecodes.
A second component is the JIT compiler. This translates bytecodes into the target machine's native machine code so that it can be executed by the target hardware.
1 - A typical bytecode interpreter does some work to map abstract stack frames and object layouts to concrete ones involving target-specific sizes and offsets. But to call this an "intermediate code" is a stretch. The interpreter is really just enhancing the bytecodes.
Giving a 1000 foot view which will hopefully clear things up:
There are 2 main steps to a java application: compilation, and runtime. Each process has very different functions and purposes. The main processes for both are outlined below:
Compilation
This is (normally) executed by [com.sun.tools.javac][1] usually found in the tools.jar file, traditionally in your $JAVA_HOME - the same place as java.jar, etc.
The goal here is to translate .java source files into .class files which contain the "recipe" for the java runtime environment.
Compilation steps:
Parsing: the files are read, and stripped of their 'boundary' syntax characters, such as curly braces, semicolons, and parentheses. These exists to tell the parser which java object to translate each source component into (more about this in the next point).
AST creation: The Abstract Syntax Tree is how a source file is represented. This is a literal "tree" data structure, and the root class for this is [com.sun.tools.JCTree][3]. The overall idea is that there is a java object for each Expression and each Statement. At this point in time relatively little is known about actual "types" that each represent. The only thing that is checked for at the creation of the AST is literal syntax
Desugar: This is where for loops and other syntactical sugar are translated into simpler form. The language is still in 'tree' form and not bytecode so this can easily happen
Type checking/Inference: Where the compiler gets complex. Java is a static language, so the compiler has to go over the AST using the Visitor Pattern and figure out the types of everything ahead of tim and makes sure that at runtime everything (well, almost) will be legal as far as types, method signatures, etc. goes. If something is too vague or invalid, compilation fails.
Bytecode: Control flow is checked to make sure that the program execution logic is valid (no unreachable statements, etc.) If everything passes the checks without errors, then the AST is translated into the bytecodes that the program represents.
.class file writing: at this point, the class files are written. Essentially, the bytecode is a small layer of abstraction on top of specialized machine code. This makes it possible to port to other machines/CPU structures/platforms without having to worry about the relatively small differences between them.
Runtime
There is a different Runtime Environment/Virtual Machine implementation for each computer platform. The Java APIs are universal, but the runtime environment is an entirely separate piece of software.
JRE only knows how to translate bytecode from the class files into machine code compatible with the target platform, and that is also highly optimized for the respective platform.
There are many different runtime/vm implementations, but the most popular one is the Hotspot VM.
The VM is incredibly complex and optimizes your code at runtime. Startup times are slow but it essentially "learns" as it goes.
This is the 'JIT' (Just-in-time) concept in action - the compiler did all of the heavy lifting by checking for correct types and syntax, and the VM simply translates and optimizes the bytecode to machine code as it goes.
Also...
The Java compiler API was standardized under JSR 199. While not exactly falling under same thing (can't find the exact JLS), many other languages and tools leverage the standardized compilation process/API in order to use the advanced JVM (runtime) technology that Oracle provides, while allowing for different syntax.
See Scala, Groovy, Kotlin, Jython, JRuby, etc. All of these leverage the Java Runtime Environment because they translate their different syntax to be compatible with the Java compiler API! It's pretty neat - anyone can write a high-performance language with whatever syntax they want because of the decoupling of the two. There's adaptations for almost every single language for the JVM
I'll answer based on my experience on creating a DSL.
C is compiled because you run pass the source code to the gcc and runs the stored program in machine code.
Python is interpreted because you run programs by passing the program source to the interpreter. The interpreter reads the source file and executes it.
Java is a mix of both because you "compile" the Java file to bytecode, then invokes the JVM to run it. Bytecode isn't machine code, it needs to be interpreted by the JVM. Java is in a tier between C and Python because you cannot do fancy things like a "eval" (evaluating chunks of code or expressions at runtime as in Python). However, Java has reflection abilities that are impossible to a C program to have. In short, the design of Java runtime being in a intermediary tier between a pure compiled and a interpreted language gives the best (and the worst) of the two words in terms of performance and flexibility.
However, Python also has a virtual machine and it's own bytecode format. The same applies to Perl, Lua, etc. Those interpreters first converts a source file to a bytecode, then they interpret the bytecode.
I always wondered why doing this is necessary, until I made my own interpreter for a simulation DSL. My interpreter does a lexical analysis (break a source in tokens), converts it to a abstract syntax tree, then it evaluates the tree by traversing it. For software engineering sake I'm using some design patterns and my code heavily uses polymorphism. This is very slow in comparison to processing a efficient bytecode format that mimics a real computer architecture. My simulations would be way faster if I create my own virtual machine or use a existent one. For evaluating a long numeric expression, for instance, it'll be faster to translate it to something similar to assembly code than processing a branch of a abstract tree, since it requires calling a lot of polymorphic methods.
There are two ways of executing a program.
By way of a compiler: this parses a text in the programming language (say .c) to machine code, on Windows .exe. This can then be executed independent of the compiler.
This compilation can be done by compiling several .c files to several object files (intermediate products), and then linking them into a single application or library.
By way of an interpreter: this parses a text in the programming language (say .java) and "immediately" executes the program.
With java the approach is a bit hybrid/stacked: the java compiler javac compiles .java to .class files, and possibles zips those in .jar (or .war, .ear ...). The .class files consist of a more abstract byte code, for an abstract stack machine.
Then the java runtime java (call JVM, java virtual machine, or byte code interpreter) can execute a .class/.jar. This is in fact an interpreter of java byte code.
Nowadays it also translates (parts) of the byte code at run time to machine code. This is also called a just-in-time compiler for byte code to machine code.
In short:
- a compiler just creates code;
- an interpreter immediately executes.
An interpreter will loop over parsed commands / a high level intermediate code, and interprete every command with a piece of code. Indirect an in principle slow.

How to run java program without JVM?

I have simple java programm that will just print Hello World.Can it be possible to print without using JVM installed in a machine ?But compiler is there.
You can compile Java Byte Code to Machine Code using something like this:
http://en.wikipedia.org/wiki/GNU_Compiler_for_Java
Or you can use any of the MANY Java to C/C++ converters out there to make a C/C++ version, and compile that. (Search Google for "Java C Converter")
Warning: I have never tried any of these, including the linked GNU Compiler, so I cannot make any attestations to their usefulness or quality.
#SplinterReality already pointed you to some compilers (googling "java bytecode to native code compiler" will show you some more options).
I will just expand on that seeing some people are a bit confused:
The JVM is a program that takes java bytecode, which you get by running javac on your java source code (or you generate it in some other fashion). Bytecode is just a set of instructions that the JVM understands, it's there to give you a consistent set of opcodes which are platform independent. It's the JVM's job to then map those opcodes to native instructions, that's why JVMs are platform dependent. That's why when you compile your java source code to java bytecode you can run it on every platform.
That's why, whether you have java source or bytecode, you can take it and just compile it directly to native code (platform dependent, actually sometimes that's exactly what the JVM, and to be more precise the JIT, does - it compiles stuff to native code when it sees the need to). Still there's more to the JVM than just bytecode interpretation - for instance you need to take care of garbage collection.
So the answer is: yes, but do you really want to do it? Unless you want to be running it on some embedded systems I don't really see any reason to.
Yees, it is possible to run a java program without a JVM, albeit with limitations. Aside from the http://en.wikipedia.org/wiki/GNU_Compiler_for_Java , there is the GraalVM native-image generator, which could be used : https://www.graalvm.org.
If that program is a one file source code (not compiled in bytecode) then you can ;)
Here or here. Here will be fine as well ;)
Just put it there and press compile and run.
But it will work only with simple stuff only, and you have to have source code.
I would not be surprised if there would be a site that would allowed to upload class from users PC

What interprets Java's byte code

I was wondering if Java get's assembled and in my readings I found the compiler creates byte code which is then run on the Java Virtual Machine. Does the JVM interpret the byte code and execute it?
This is why I'm confused. Today in class the prof said "the compiler takes a high level language, creates assembly language, then the assembler takes the assembly language and creates machine language (binary) which can be run". So if Java compiles to bytecode how can it be run?
There is a standard compiler setup, such as would be used for the C language, and then there is Java, which is significantly different.
The standard C compiler compiles (through several internal phases) into "machine instructions" which are directly understood by the x86 processor or whatever.
The Java compiler, on the other hand, compiles to what are sometimes called "bytecodes". These are machine instructions, but for an imaginary machine, the Java Virtual Machine. So the JVM interprets the bytecodes just like a "real" machine processes it's machine instructions. (The main advantage of this is that a program compiled into bytecodes will run on any JVM, whether it be on an x86 system, an IBM RISC box, or the ARM processor in a Android -- so long as there's a JVM the code will run.)
(There have historically been a number of "virtual machines" similar to Java, the UCSD Pascal "P-code" system being one of the more successful ones.)
But it gets more complicated --
Interpreting "bytecodes" is fairly slow and inefficient, so most Java implementations have some sort of scheme to translate the bytecodes into "real" machine instructions. In some cases this is done statically, in a separate compile step, but most often it's done with a "just-in-time compiler" (JITC) which converts small portions of the bytecodes to machine instructions while the application is running. These get to be quite elaborate, with complex schemes to decide which segments of code will benefit most from translating into hardware machine instructions. But they all, for the most part, do their magic without you needing to be aware of what's going on, and without you having to compile your Java code to target a specific type of processor.
Think of bytecode as the machine langauge of the JVM. (Compilers don't HAVE to produce assembly code which has to be assembled, but they're a lot easier to write that way.)
Just a clarifying note:
That which in java is called "bytecode" is what in your original description is "creates machine language (binary) which can be run"
So the answer to how to run java bytecode is:
You build a processor which can handle java bytecode, in the same way that if you want to execute normal x86 code you build a cpu to handle that.
Javas binary machine language is not really different from the binary instruction format of other cpus such as x86 or powerpc. And there do exists cpus which can execute java bytecode directly. (That would be a normal Intel/Amd cpu).
An other example: How would you run powerpc code, on a normal intel cpu? You would build a piece of software which would at runtime translate the powerpc binary code, to x86 code. The case for java is not really that different. So to run java code on a x86 cpu, you need a program which translate the java binary code(aka the bytecode) to x86 binary code. This is what the jvm* does. And it does this either by interpreting the java instructions one at a time, or by translating a huge chunk of instructions at a time(Called jit). Exactly how the jvm handles the translation depend on which jvm implementation you use and its settings.(There are multiple independent implementations of java jvms which implement their translation in different ways).
But there is one thing which make java a bit different. Unlike other binary instruction formats such as x86, java was newer really designed to run directly on a cpu. Its binary format is designed in a way which make it easy to translate it to binary code for "normal" cpus such as x86 or powerpc.
*The jvm does in fact handle more then just translating the java binary code to processor dependend code. It also handles memory allocations for java programs, and it handles communication between a java program, and the users operation system. This is done to make the java program relative independent of the users operation system and platform details.
In a short explanation: The JVM translates the Java Byte Code into machine specific code. The generated machine specific code is then executed by the machine.
The Java compiler translates JAVA into ByteCode. The JVM translates ByteCode into Assembly (machine specific code) at runtime. The machine executes the Assembly.

What is the difference between "architecture-neutral" and "portable"?

I'm reading Herbert Schildt's book "Java: The Complete Reference" and there he writes that Java is portable AND architecture-neutral. What is the difference between this two concepts? I couldn't understand it from the text.
Take a look at this white paper on Java.
Basically they're saying that in addition to running on multiple environments (because of being interpreted within the JVM), it also runs the same regardless of environment. The former is what makes it portable, the latter is what makes it architecture-neutral. For example, the size of an int does not vary based on platform; it's established by the JVM.
A portable C program:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
printf("Hello, World!");
return (EXIT_SUCCESS);
}
You can take that C program and compile it on any machine with a C compiler and have it work (assuming it supports printf... I am guessing some things out there may not).
If you compile it on Windows and try to run that binary on a Mac it won't work.
The same sort of program written in Java will also compile on any machine with a Java compiler installed, but the resulting .class file will also run on any machine with a Java VM. That is the architectural neutral part.
So, portable is a source code idea, while architectural neutral is an executable idea.
Looking around I found another book that describes the difference between the two.
For architecture neutral the compiler will generate an architecture-neutral object file meaning that compiled Java code (bytecode) can run on many processors given the presence of a Java runtime.
For portable it means there are are no implementation-dependent aspects of the specification. For instance in C++ an int can be 16-bit, or 32 bit depending on who is implementing the specification where as in Java an int is always 32 bit.
I got my information from a different book (Core Java 2: Fundamentals) so it may differ from his meaning. Here is a link: Core Java 2: Fundamentals
With architecture-neutral, the book means that the byte code is independent of the underlying platform that the program is running on. For example, it doesn't matter if your operating system is 32-bit or 64-bit, the Java byte code is exactly the same. You don't have to recompile your Java source code for 32-bit or 64-bit. (So, "architecture" refers to the CPU architecture).
"Portable" means that a program written to run on one operating system works on another operating system without changing anything. With Java, you don't even have to recompile the source code; a *.class file compiled on Windows, for example, works on Linux, Mac OS X or any other OS for which you have a Java virtual machine available.
Note that you have to take care with some things to make your Java code truly portable. If you for example hard-code Windows-style file paths (C:\Users\Myself...) in your Java application, it is not going to work on other operating systems.
I suspect that he means that code can run on many platforms without recompilation.
It is also possible to write code that deals with the underlying system without rewrites or conditions.
E.g. Serialized objects from a 32 bit Windows system can be read on a 64bit Linux system.
there are 3 related features in java.
platform independent -> this means that the java program can be run on any OS without considering its vendor. It is implemented by using the MAGIC CODE called "BYTE CODE". The JVM then either interprets this at runtime or uses JIT (Just in Time) compilation to compile it to machine code for the architecture that is being run on (e.g. i386).
architecture neutral -> it means the java program can be run on any processor irrespective of its vendor and architecture. so it avoids rebuilding problem.
portable -> a programming language/technology is said to be purely portable if it satisfies the above two features.
.class file is portable because it can run on any OS . The reason is , .class file generated by JVM is same for all OS. On the other hand JVM is differ as OS , but it generate same .class file for all OS, so JVM is architectural neutral.
What is difference between Architecture Neutral and Portable?
Architecture Neutral: Java is an Architecture neutral programming language because, java allows its application to compile on one hardware architecture and to execute on another hardware architecture.
Portable: Java is a portable programming language because, java is able to execute its application and all the operating system and all the hardware system.
In Terms of Java
Java Architecture Neutral - Here we are talking about the Operating System Architecture i.e the java Generate the Intermediate Byte-code(binary code) (handle by the JVM) and allow the java code to run on any O.S for which you have a Java virtual machine available( irrespective of its O.S architecture to handle memory allocation ,Cashing, register handling , bit code processing 32 bit or 64 bit while interpreting the code like each interpreter execute the code line by line - this is handle by jvm with respect to Hardware and O.S configuration) .
Portable (Generic meaning like transferable, Platform Independent, or even in terms of the Source code is fix for all i.e Simply means support to many)
Java Portable means java machine code write in one machine and will run on any machine that has proper JVM with respect to O.S.

Compiled vs. Interpreted Languages

I'm trying to get a better understanding of the difference. I've found a lot of explanations online, but they tend towards the abstract differences rather than the practical implications.
Most of my programming experiences has been with CPython (dynamic, interpreted), and Java (static, compiled). However, I understand that there are other kinds of interpreted and compiled languages. Aside from the fact that executable files can be distributed from programs written in compiled languages, are there any advantages/disadvantages to each type? Oftentimes, I hear people arguing that interpreted languages can be used interactively, but I believe that compiled languages can have interactive implementations as well, correct?
A compiled language is one where the program, once compiled, is expressed in the instructions of the target machine. For example, an addition "+" operation in your source code could be translated directly to the "ADD" instruction in machine code.
An interpreted language is one where the instructions are not directly executed by the target machine, but instead read and executed by some other program (which normally is written in the language of the native machine). For example, the same "+" operation would be recognised by the interpreter at run time, which would then call its own "add(a,b)" function with the appropriate arguments, which would then execute the machine code "ADD" instruction.
You can do anything that you can do in an interpreted language in a compiled language and vice-versa - they are both Turing complete. Both however have advantages and disadvantages for implementation and use.
I'm going to completely generalise (purists forgive me!) but, roughly, here are the advantages of compiled languages:
Faster performance by directly using the native code of the target machine
Opportunity to apply quite powerful optimisations during the compile stage
And here are the advantages of interpreted languages:
Easier to implement (writing good compilers is very hard!!)
No need to run a compilation stage: can execute code directly "on the fly"
Can be more convenient for dynamic languages
Note that modern techniques such as bytecode compilation add some extra complexity - what happens here is that the compiler targets a "virtual machine" which is not the same as the underlying hardware. These virtual machine instructions can then be compiled again at a later stage to get native code (e.g. as done by the Java JVM JIT compiler).
A language itself is neither compiled nor interpreted, only a specific implementation of a language is. Java is a perfect example. There is a bytecode-based platform (the JVM), a native compiler (gcj) and an interpeter for a superset of Java (bsh). So what is Java now? Bytecode-compiled, native-compiled or interpreted?
Other languages, which are compiled as well as interpreted, are Scala, Haskell or Ocaml. Each of these languages has an interactive interpreter, as well as a compiler to byte-code or native machine code.
So generally categorizing languages by "compiled" and "interpreted" doesn't make much sense.
Start thinking in terms of a: blast from the past
Once upon a time, long long ago, there lived in the land of computing
interpreters and compilers. All kinds of fuss ensued over the merits of
one over the other. The general opinion at that time was something along the lines of:
Interpreter: Fast to develop (edit and run). Slow to execute because each statement had to be interpreted into
machine code every time it was executed (think of what this meant for a loop executed thousands of times).
Compiler: Slow to develop (edit, compile, link and run. The compile/link steps could take serious time). Fast
to execute. The whole program was already in native machine code.
A one or two order of magnitude difference in the runtime
performance existed between an interpreted program and a compiled program. Other distinguishing
points, run-time mutability of the code for example, were also of some interest but the major
distinction revolved around the run-time performance issues.
Today the landscape has evolved to such an extent that the compiled/interpreted distinction is
pretty much irrelevant. Many
compiled languages call upon run-time services that are not
completely machine code based. Also, most interpreted languages are "compiled" into byte-code
before execution. Byte-code interpreters can be very efficient and rival some compiler generated
code from an execution speed point of view.
The classic difference is that compilers generated native machine code, interpreters read source code and
generated machine code on the fly using some sort of run-time system.
Today there are very few classic interpreters left - almost all of them
compile into byte-code (or some other semi-compiled state) which then runs on a virtual "machine".
The extreme and simple cases:
A compiler will produce a binary executable in the target machine's native executable format. This binary file contains all required resources except for system libraries; it's ready to run with no further preparation and processing and it runs like lightning because the code is the native code for the CPU on the target machine.
An interpreter will present the user with a prompt in a loop where he can enter statements or code, and upon hitting RUN or the equivalent the interpreter will examine, scan, parse and interpretatively execute each line until the program runs to a stopping point or an error. Because each line is treated on its own and the interpreter doesn't "learn" anything from having seen the line before, the effort of converting human-readable language to machine instructions is incurred every time for every line, so it's dog slow. On the bright side, the user can inspect and otherwise interact with his program in all kinds of ways: Changing variables, changing code, running in trace or debug modes... whatever.
With those out of the way, let me explain that life ain't so simple any more. For instance,
Many interpreters will pre-compile the code they're given so the translation step doesn't have to be repeated again and again.
Some compilers compile not to CPU-specific machine instructions but to bytecode, a kind of artificial machine code for a ficticious machine. This makes the compiled program a bit more portable, but requires a bytecode interpreter on every target system.
The bytecode interpreters (I'm looking at Java here) recently tend to re-compile the bytecode they get for the CPU of the target section just before execution (called JIT). To save time, this is often only done for code that runs often (hotspots).
Some systems that look and act like interpreters (Clojure, for instance) compile any code they get, immediately, but allow interactive access to the program's environment. That's basically the convenience of interpreters with the speed of binary compilation.
Some compilers don't really compile, they just pre-digest and compress code. I heard a while back that's how Perl works. So sometimes the compiler is just doing a bit of the work and most of it is still interpretation.
In the end, these days, interpreting vs. compiling is a trade-off, with time spent (once) compiling often being rewarded by better runtime performance, but an interpretative environment giving more opportunities for interaction. Compiling vs. interpreting is mostly a matter of how the work of "understanding" the program is divided up between different processes, and the line is a bit blurry these days as languages and products try to offer the best of both worlds.
From http://www.quora.com/What-is-the-difference-between-compiled-and-interpreted-programming-languages
There is no difference, because “compiled programming language” and
“interpreted programming language” aren’t meaningful concepts. Any
programming language, and I really mean any, can be interpreted or
compiled. Thus, interpretation and compilation are implementation
techniques, not attributes of languages.
Interpretation is a technique whereby another program, the
interpreter, performs operations on behalf of the program being
interpreted in order to run it. If you can imagine reading a program
and doing what it says to do step-by-step, say on a piece of scratch
paper, that’s just what an interpreter does as well. A common reason
to interpret a program is that interpreters are relatively easy to
write. Another reason is that an interpreter can monitor what a
program tries to do as it runs, to enforce a policy, say, for
security.
Compilation is a technique whereby a program written in one language
(the “source language”) is translated into a program in another
language (the “object language”), which hopefully means the same thing
as the original program. While doing the translation, it is common for
the compiler to also try to transform the program in ways that will
make the object program faster (without changing its meaning!). A
common reason to compile a program is that there’s some good way to
run programs in the object language quickly and without the overhead
of interpreting the source language along the way.
You may have guessed, based on the above definitions, that these two
implementation techniques are not mutually exclusive, and may even be
complementary. Traditionally, the object language of a compiler was
machine code or something similar, which refers to any number of
programming languages understood by particular computer CPUs. The
machine code would then run “on the metal” (though one might see, if
one looks closely enough, that the “metal” works a lot like an
interpreter). Today, however, it’s very common to use a compiler to
generate object code that is meant to be interpreted—for example, this
is how Java used to (and sometimes still does) work. There are
compilers that translate other languages to JavaScript, which is then
often run in a web browser, which might interpret the JavaScript, or
compile it a virtual machine or native code. We also have interpreters
for machine code, which can be used to emulate one kind of hardware on
another. Or, one might use a compiler to generate object code that is
then the source code for another compiler, which might even compile
code in memory just in time for it to run, which in turn . . . you get
the idea. There are many ways to combine these concepts.
The biggest advantage of interpreted source code over compiled source code is PORTABILITY.
If your source code is compiled, you need to compile a different executable for each type of processor and/or platform that you want your program to run on (e.g. one for Windows x86, one for Windows x64, one for Linux x64, and so on). Furthermore, unless your code is completely standards compliant and does not use any platform-specific functions/libraries, you will actually need to write and maintain multiple code bases!
If your source code is interpreted, you only need to write it once and it can be interpreted and executed by an appropriate interpreter on any platform! It's portable! Note that an interpreter itself is an executable program that is written and compiled for a specific platform.
An advantage of compiled code is that it hides the source code from the end user (which might be intellectual property) because instead of deploying the original human-readable source code, you deploy an obscure binary executable file.
A compiler and an interpreter do the same job: translating a programming language to another pgoramming language, usually closer to the hardware, often direct executable machine code.
Traditionally, "compiled" means that this translation happens all in one go, is done by a developer, and the resulting executable is distributed to users. Pure example: C++.
Compilation usually takes pretty long and tries to do lots of expensive optmization so that the resulting executable runs faster. End users don't have the tools and knowledge to compile stuff themselves, and the executable often has to run on a variety of hardware, so you can't do many hardware-specific optimizations. During development, the separate compilation step means a longer feedback cycle.
Traditionally, "interpreted" means that the translation happens "on the fly", when the user wants to run the program. Pure example: vanilla PHP. A naive interpreter has to parse and translate every piece of code every time it runs, which makes it very slow. It can't do complex, costly optimizations because they'd take longer than the time saved in execution. But it can fully use the capabilities of the hardware it runs on. The lack of a separrate compilation step reduces feedback time during development.
But nowadays "compiled vs. interpreted" is not a black-or-white issue, there are shades in between. Naive, simple interpreters are pretty much extinct. Many languages use a two-step process where the high-level code is translated to a platform-independant bytecode (which is much faster to interpret). Then you have "just in time compilers" which compile code at most once per program run, sometimes cache results, and even intelligently decide to interpret code that's run rarely, and do powerful optimizations for code that runs a lot. During development, debuggers are capable of switching code inside a running program even for traditionally compiled languages.
First, a clarification, Java is not fully static-compiled and linked in the way C++. It is compiled into bytecode, which is then interpreted by a JVM. The JVM can go and do just-in-time compilation to the native machine language, but doesn't have to do it.
More to the point: I think interactivity is the main practical difference. Since everything is interpreted, you can take a small excerpt of code, parse and run it against the current state of the environment. Thus, if you had already executed code that initialized a variable, you would have access to that variable, etc. It really lends itself way to things like the functional style.
Interpretation, however, costs a lot, especially when you have a large system with a lot of references and context. By definition, it is wasteful because identical code may have to be interpreted and optimized twice (although most runtimes have some caching and optimizations for that). Still, you pay a runtime cost and often need a runtime environment. You are also less likely to see complex interprocedural optimizations because at present their performance is not sufficiently interactive.
Therefore, for large systems that are not going to change much, and for certain languages, it makes more sense to precompile and prelink everything, do all the optimizations that you can do. This ends up with a very lean runtime that is already optimized for the target machine.
As for generating executables, that has little to do with it, IMHO. You can often create an executable from a compiled language. But you can also create an executable from an interpreted language, except that the interpreter and runtime is already packaged in the exectuable and hidden from you. This means that you generally still pay the runtime costs (although I am sure that for some language there are ways to translate everything to a tree executable).
I disagree that all languages could be made interactive. Certain languages, like C, are so tied to the machine and the entire link structure that I'm not sure you can build a meaningful fully-fledged interactive version
This is one of the biggest misunderstood things in computer science as I guess.
Because interpretation and compilation are completely two different things, which we can't compare in this way.
The compilation is the process of translating one language into another language. There are few types of compilations.
Compiling - Translate high-level language into machine/byte code (ex: C/C++/Java)
Transpiling - Translate high-level language into another high-level language (ex: TypeScript)
Interpretation is the process of actually executing the program. This may happen in few different ways.
Machine level interpretation - This interpretation happens to the code which is compiled into machine code. Instructions are directly interpreted by the processor. Programming languages like C/C++ generate machine code, which is executable by the processor. So the processor can directly execute these instructions.
Virtual machine level interpretation - This interpretation happens to the code which is not compiled into the machine level (processor support) code, but into some intermediate-level code. This execution is done by another software, which is executed by the processor. At this time actually processor doesn't see our application. It just executing the virtual machine, which is executing our application. Programming languages like Java, Python, C# generate a byte code, which is executable by the virtual interpreter/machine.
So at the end of the day what we have to understand is, all the programming languages in the world should be interpreted at some time. It may be done by a processor(hardware) or a virtual machine.
The compilation is just the process of bringing the high-level code we write that is human-understandable into some hardware/software machine-understandable level.
These are completely two different things, which we can't compare. But that terminology is pretty much good to teach beginners how programming languages work.
PS:
Some programming languages like Java have a hybrid approach to do this. First, compile the high-level code into byte code which is virtual-machine readable. And on the fly, a component called the JIT compiler compiles byte-code into machine code. Specifically, code lines that are executed again and again many times are get translated into the machine language, which makes the interpretation process much faster. Because hardware processor is always much faster than virtual interpreter/processor.
How Java JIT compiler works
It's rather difficult to give a practical answer because the difference is about the language definition itself. It's possible to build an interpreter for every compiled language, but it's not possible to build an compiler for every interpreted language. It's very much about the formal definition of a language. So that theoretical informatics stuff noboby likes at university.
The Python Book © 2015 Imagine Publishing Ltd, simply distunguishes the difference by the following hint mentioned in page 10 as:
An interpreted language such as Python is one where the source code is converted to machine code and then executed each time the program runs. This is different from a compiled language such as C, where the source code is only converted to machine code once – the resulting machine code is then executed each time the program runs.
Compile is the process of creating an executable program from code written in a compiled programming language. Compiling allows the computer to run and understand the program without the need of the programming software used to create it. When a program is compiled it is often compiled for a specific platform (e.g. IBM platform) that works with IBM compatible computers, but not other platforms (e.g. Apple platform).
The first compiler was developed by Grace Hopper while working on the Harvard Mark I computer. Today, most high-level languages will include their own compiler or have toolkits available that can be used to compile the program. A good example of a compiler used with Java is Eclipse and an example of a compiler used with C and C++ is the gcc command. Depending on how big the program is it should take a few seconds or minutes to compile and if no errors are encountered while being compiled an executable file is created.check this information
Short (un-precise) definition:
Compiled language: Entire program is translated to machine code at once, then the machine code is run by the CPU.
Interpreted language: Program is read line-by-line and as soon as a line is read the machine instructions for that line are executed by the CPU.
But really, few languages these days are purely compiled or purely interpreted, it often is a mix. For a more detailed description with pictures, see this thread:
What is the difference between compilation and interpretation?
Or my later blog post:
https://orangejuiceliberationfront.com/the-difference-between-compiler-and-interpreter/

Categories

Resources