I'm trying to understand, how java program interacts with compiler.
Lets assume, we write simple java language on plane text file. At the core level its stored in bits pattern on disk.
Java's compiler is separate identity, which is some sort of bits pattern only.
This pattern can consume something which it understands. Consumes java bit pattern so called java program and produce instructions to be processed by processor.
Where this process happens, in memory or processor ? Process where java compiler eats java and produce instructions to understand by processor.
My understanding says, memory is just for loading which we are able to see on screen, coming from disk or processor. Java program and compiler code both exists on screen and should be loaded in memory to go further.
Then how and in what sequence creation of processor's instructions created ? Where they are interacting and how?
Can anyone please help me understand this? Very curious to know. Any book or reference will also work.
It's pretty much the same as any compiler system. The compiler reads source text from a file (.java, in this case), and writes equivalent instructions to another file (.class, in this case).
The execution of the compiler proceeds like the execution of any computer program: the processor executes instructions that come from memory, and those instructions read and write memory. It does not seem appropriate to call this "in memory" or "in the processor" - it's "in the computer" as a whole.
More detail:
The compiler is, when running, instructions loaded in memory being executed by the processor. The compiler will execute instructions to read the .java files from disk into memory so that it, the compiler, can translate the Java code. The compiler will execute instructions to write the compiled code out to disk (as .class files).
The compiler and Java program are not running at the same time. The compiler translates the Java program entirely before that program (the 'application program') is run. First compile .java to .class files; then execute the .class files.
The application program is, when running, instructions loaded in memory being executed by the processor. (This is a simplification for this explanation, avoiding the existence of a thing called the JVM. It does not materially change the overall idea).
This isn't so much about Java as it is about how computers work in general. Data have to be accessible to the processor, and that means that the data need to be in memory. The processor's instructions, i.e., an executing program, can't manipulate data any other way (to an approximation; there are exceptions that are not applicable at the level of explanation we're going for here)
Can Java run directly on hardware (assuming there's enough memory to include the necessary JRE/JVM? If it can, how does System.out.println work? I'd think there's nowhere for that output to go if it's all just on a cpu.
Directly on hardware? I am assuming you mean to ask if Java can run on a micro-controller? The answer is yes. The JVM is a virtual machine which is essentially its own operating system. The JVM was designed to do exactly what your wondering about. The JVM's two primary functions are to allow Java programs to run on any device or operating system, "Write once, run anywhere" and to optimize memory solutions.
To answer your second question. In order to visually see the output of a System.out.println() call one would simply need to provide the micro-controller with a screen. However, in theory the code would still execute without you seeing it being displayed. So one could write a Java applet that prints "hello world" and then load it onto a micro-controller and run it but that's just silly.
EDIT: I assumed you were not asking: "Can you program a micro-controller with Java" Silly of me, the answer is yes you certainly can; however, you wouldn't want too because the JVM is rather large and it would take up a lot of space. That being said, if you are interested take a look at: the STM32 Java-ready micro-controllers or the Renesas RX. Also, you could run a gutted JVM using uJ or NonvoVM.
The short answer is no, due to the JVM and write once run anywhere feature in java, the code is not ran directly on the hardware but within the JVM... THE JVM essentially acts as a middle man between different hardware/OS...if you look to accomplish this then take a look at C++
Is there some way to get reasonable (not noticeable) starting times for Java, thus making it suitable for writing command line scripts (not long-lived apps)?
For a demonstation of the issue, take a simple Hello World program in Java and JavaScript (run w/ node.js) on my Macbook Pro:
$ time java T
Hello world!
real 0m0.352s
user 0m0.301s
sys 0m0.053s
$ time node T.js
Hello world!
real 0m0.098s
user 0m0.079s
sys 0m0.013s
There is a noticeable lag with the Java version, not so with Node. This makes command line tools seem unresponsive. (This is especially true if they rely on more than one class, unlike the simple T.java above.
Not likely, only thing you might be able to try is a different implementation of the JVM, but that probably won't change. Most Java apps are (relatively) long lived though and possibly interactive, which means the JVM startup time becomes lost in the noise of normal uses.
Have you actually tried timing a Java command-line app called repeatedly, though? I would expect after the first incarnation for the start-up time to be alleviated somewhat by the library classes being in the file system cache.
That said, yes, the Java platform is not one of the simplest and in any case you're not going to compete with a small native executable.
Edit: as you say that the timings above are for "warmed up" calls, then a possible workaround could be:
write the gubbins of your commands in Java
write a simple local continually running "server" that takes commands and passes them to the relevant Java routines
write a simple native command-line wrapper (or write it in something that's fast to start up) whose sole raison d'être is to pass commands on to the Java server and spit out the result.
This ain't nice, but it could allow you to write the gubbins of your routines in Java (which I assume is essentially what you want) while still keeping the command line model of invocation.
As others have said, the plain answer is just "not really". You can possibly make minor performance improvements, but you're never going to get away from the fact that the VM is going to take a while to start up and get going.
Make sure you haven't got the server VM selected for apps like this - that's one thing that really will increase the start up time.
The only real way round it is to compile Java to native code, which you can do with GCJ - so if you must write these apps in Java and you must have them faster, that might be a route to look down. Bear in mind though it's not that up-to-date and maintenance on it largely seems to be dying out too.
Haven't tried it yet but might be worth looking at nailgun. It will run your Java programs in the same JVM, so after "warming up" should be pretty fast. A "hello world" example goes from taking 0.132s to taking 0.004s
http://www.martiansoftware.com/nailgun/background.html
You can get a small speed-up with class data sharing https://rmannibucau.metawerx.net/post/java-class-data-sharing-docker-startup
A much bigger speedup should come from doing ahead-of-time compilation to a static binary using GraalVM native-image, although it's still tricky to use. A lot of libraries haven't been made compatible.
What the differences between classical compilation model (C, C++, etc.) and the Java compilation model?
A proper answer to your question could take several hundred pages to answer, but I'll try to sum it up in a few paragraphs.
Basically, the "classic compilation model" you refer to takes as input human-written source code and emits machine code, which can be loaded and run without further translation of the machine code. One ramification of this is that the resulting machine code can only be run on compatible hardware and can only be run within a compatible operating system.
The Java compilation model takes human-written source code as input and emits not machine code, but so-called "byte code". Byte code cannot be directly executed on a machine. Instead, it needs to be translated once again by another compiler to machine code, or interpreted on-the-fly by a device that executes instructions on the machine that correspond to the instructions in the byte code. The latter device is often referred to as a Virtual Machine. One ramification of this model is that the byte code can be "run" on any platform that has either a byte code compiler or virtual machine written for it. This gives Java the appearance and effect of complete portability, where there is no such portability implied by the machine code emitted by a C++ compiler stack.
Two aspects play into the C (and C++) compilation model. One is its longer history than Java, meaning that it caters to very low-powered compilers and machines. The second is the compilation target, which is usually low-level machine code.
To target low-memory compiler environments, C code must be readable from top to bottom, with no backtracking. This means that you have to follow a strict discipline for the order of declarations. (C++ relaxes this a little bit for class definitions.) Further more, each source file must be compilable as an independent translation unit which need not know anything about other source files.
Second, because C targets low-level machine code, this means that each translation unit contains essentially no metadata, in stark contrast to Java class files. This necessitates a stronger coding discipline in which each translation unit must be provided with the necessary declarations. The compiler cannot just scan all the other files in order to get the required information; it is up to the user to supply it. (C++ enforces this more rigidly, in C you can get away with nasty errors by forgetting a declaration.)
Bear in mind that a C program has to be fully compiled and linked at compile time, so a lot of information has to be available already at that point. Java programs can load classes at runtime, and Java execution generally performs more "fitting" operations (casting, essentially, as opposed to static linking in C) at runtime. The more sophisticated runtime environment of Java allows for a more flexible and modular compilation model.
I am going to be brave and compare performance. ;)
The Java compiler javac does little optimisation preferring to syntax check code. It does all the reasonable checks required to ensure it will run on a JVM, and some constant evaluation and that's about it.
Most of the smart compilation is done by the JIT which can perform dynamic complication based on how the program is used. This allows it to inline "virtual" methods, for example, even if the caller and callee are in different libraries.
The C/C++ compiler performs significant static analysis up front. This means a program will run at almost full speed right from the start. The CPU performs some dynamic optimisation with instruction re-ordering and branch prediction. While C/C++ lacks dynamic optimisation, it gains from by making low level access to the system much easier. (Its usually not impossible in Java, but low level operations which are trivial in C/C++ can be complex and obscure in Java) It also provides more ways to do the same thing allowing you to choose the optimal solution to your problem.
When Java is likely to be faster.
If your style of programming suits Java and you only use the sort of features Java supports, Java is likely to be marginally faster (due to dynamic compilation) i.e. you wouldn't use C/C++ to their full potential anyway.
If your code contains lots of dead code (possibly only known to be dead at run time) Java does a good job at eliminating this. (IMHO A high percentage of micro-benchmarks which suggest Java is faster than C++ are of this type)
You have a very limited time and/or resources to implement your application. (In which case an even higher level language might be better) i.e. You don't have time to optimise your code much and you need to write safe abstracted code.
When C/C++ is likely to be faster.
If you use most of the functionality C/C++ provides. Something more advanced programmers tend to do.
If startup time matters.
If you need to be creative about algorithms or data structures.
If you can exploit a low level hardware feature, like direct access to devices.
For short, "classical" compilation (which is a temp term provided by the material because they don't have a real word for it), is basically compiling against a real device (in our case a machine with a physical processor). Instead, Java compiles to code against a virtual device, which is software installed on a machine. The virtual device is what changes and targets the real machine.
In this way your hardware is abstracted. This is why Java can work on "any" machine.
Basically, there are two kinds of magic. Machine magic is only understood by certain wizards. JVM Bytecode magic is understood by a special kind of wizard that you have to hire in order to make the machine wizard able to cast spells that make your computer do things. C and C++ compilers generally emit the machine kind, whereas Java compilers emit JVM Bytecode.
C/C++ gets compiled before execution.
Java gets compiled while executing.
Of course, neither language mandates a certain way of being compiled.
There is no difference. Both convert source code that a human understands, to a machine code that some machine understands. In Java's case it targets a virtual machine, i.e. a program instead of a piece of silicon.
Of course there's nothing to prevent a piece of silicon from understanding JVM byte code (in which case you could rename it from 'byte code' to 'machine code'). And conversely, there's nothing to prevent a compiler from converting C/C++ code to JVM byte code.
Both have a runtime and both require you to tell it which parts of the runtime you intend to use.
I really think you intended to ask a different question.
I'm trying to get a better understanding of the difference. I've found a lot of explanations online, but they tend towards the abstract differences rather than the practical implications.
Most of my programming experiences has been with CPython (dynamic, interpreted), and Java (static, compiled). However, I understand that there are other kinds of interpreted and compiled languages. Aside from the fact that executable files can be distributed from programs written in compiled languages, are there any advantages/disadvantages to each type? Oftentimes, I hear people arguing that interpreted languages can be used interactively, but I believe that compiled languages can have interactive implementations as well, correct?
A compiled language is one where the program, once compiled, is expressed in the instructions of the target machine. For example, an addition "+" operation in your source code could be translated directly to the "ADD" instruction in machine code.
An interpreted language is one where the instructions are not directly executed by the target machine, but instead read and executed by some other program (which normally is written in the language of the native machine). For example, the same "+" operation would be recognised by the interpreter at run time, which would then call its own "add(a,b)" function with the appropriate arguments, which would then execute the machine code "ADD" instruction.
You can do anything that you can do in an interpreted language in a compiled language and vice-versa - they are both Turing complete. Both however have advantages and disadvantages for implementation and use.
I'm going to completely generalise (purists forgive me!) but, roughly, here are the advantages of compiled languages:
Faster performance by directly using the native code of the target machine
Opportunity to apply quite powerful optimisations during the compile stage
And here are the advantages of interpreted languages:
Easier to implement (writing good compilers is very hard!!)
No need to run a compilation stage: can execute code directly "on the fly"
Can be more convenient for dynamic languages
Note that modern techniques such as bytecode compilation add some extra complexity - what happens here is that the compiler targets a "virtual machine" which is not the same as the underlying hardware. These virtual machine instructions can then be compiled again at a later stage to get native code (e.g. as done by the Java JVM JIT compiler).
A language itself is neither compiled nor interpreted, only a specific implementation of a language is. Java is a perfect example. There is a bytecode-based platform (the JVM), a native compiler (gcj) and an interpeter for a superset of Java (bsh). So what is Java now? Bytecode-compiled, native-compiled or interpreted?
Other languages, which are compiled as well as interpreted, are Scala, Haskell or Ocaml. Each of these languages has an interactive interpreter, as well as a compiler to byte-code or native machine code.
So generally categorizing languages by "compiled" and "interpreted" doesn't make much sense.
Start thinking in terms of a: blast from the past
Once upon a time, long long ago, there lived in the land of computing
interpreters and compilers. All kinds of fuss ensued over the merits of
one over the other. The general opinion at that time was something along the lines of:
Interpreter: Fast to develop (edit and run). Slow to execute because each statement had to be interpreted into
machine code every time it was executed (think of what this meant for a loop executed thousands of times).
Compiler: Slow to develop (edit, compile, link and run. The compile/link steps could take serious time). Fast
to execute. The whole program was already in native machine code.
A one or two order of magnitude difference in the runtime
performance existed between an interpreted program and a compiled program. Other distinguishing
points, run-time mutability of the code for example, were also of some interest but the major
distinction revolved around the run-time performance issues.
Today the landscape has evolved to such an extent that the compiled/interpreted distinction is
pretty much irrelevant. Many
compiled languages call upon run-time services that are not
completely machine code based. Also, most interpreted languages are "compiled" into byte-code
before execution. Byte-code interpreters can be very efficient and rival some compiler generated
code from an execution speed point of view.
The classic difference is that compilers generated native machine code, interpreters read source code and
generated machine code on the fly using some sort of run-time system.
Today there are very few classic interpreters left - almost all of them
compile into byte-code (or some other semi-compiled state) which then runs on a virtual "machine".
The extreme and simple cases:
A compiler will produce a binary executable in the target machine's native executable format. This binary file contains all required resources except for system libraries; it's ready to run with no further preparation and processing and it runs like lightning because the code is the native code for the CPU on the target machine.
An interpreter will present the user with a prompt in a loop where he can enter statements or code, and upon hitting RUN or the equivalent the interpreter will examine, scan, parse and interpretatively execute each line until the program runs to a stopping point or an error. Because each line is treated on its own and the interpreter doesn't "learn" anything from having seen the line before, the effort of converting human-readable language to machine instructions is incurred every time for every line, so it's dog slow. On the bright side, the user can inspect and otherwise interact with his program in all kinds of ways: Changing variables, changing code, running in trace or debug modes... whatever.
With those out of the way, let me explain that life ain't so simple any more. For instance,
Many interpreters will pre-compile the code they're given so the translation step doesn't have to be repeated again and again.
Some compilers compile not to CPU-specific machine instructions but to bytecode, a kind of artificial machine code for a ficticious machine. This makes the compiled program a bit more portable, but requires a bytecode interpreter on every target system.
The bytecode interpreters (I'm looking at Java here) recently tend to re-compile the bytecode they get for the CPU of the target section just before execution (called JIT). To save time, this is often only done for code that runs often (hotspots).
Some systems that look and act like interpreters (Clojure, for instance) compile any code they get, immediately, but allow interactive access to the program's environment. That's basically the convenience of interpreters with the speed of binary compilation.
Some compilers don't really compile, they just pre-digest and compress code. I heard a while back that's how Perl works. So sometimes the compiler is just doing a bit of the work and most of it is still interpretation.
In the end, these days, interpreting vs. compiling is a trade-off, with time spent (once) compiling often being rewarded by better runtime performance, but an interpretative environment giving more opportunities for interaction. Compiling vs. interpreting is mostly a matter of how the work of "understanding" the program is divided up between different processes, and the line is a bit blurry these days as languages and products try to offer the best of both worlds.
From http://www.quora.com/What-is-the-difference-between-compiled-and-interpreted-programming-languages
There is no difference, because “compiled programming language” and
“interpreted programming language” aren’t meaningful concepts. Any
programming language, and I really mean any, can be interpreted or
compiled. Thus, interpretation and compilation are implementation
techniques, not attributes of languages.
Interpretation is a technique whereby another program, the
interpreter, performs operations on behalf of the program being
interpreted in order to run it. If you can imagine reading a program
and doing what it says to do step-by-step, say on a piece of scratch
paper, that’s just what an interpreter does as well. A common reason
to interpret a program is that interpreters are relatively easy to
write. Another reason is that an interpreter can monitor what a
program tries to do as it runs, to enforce a policy, say, for
security.
Compilation is a technique whereby a program written in one language
(the “source language”) is translated into a program in another
language (the “object language”), which hopefully means the same thing
as the original program. While doing the translation, it is common for
the compiler to also try to transform the program in ways that will
make the object program faster (without changing its meaning!). A
common reason to compile a program is that there’s some good way to
run programs in the object language quickly and without the overhead
of interpreting the source language along the way.
You may have guessed, based on the above definitions, that these two
implementation techniques are not mutually exclusive, and may even be
complementary. Traditionally, the object language of a compiler was
machine code or something similar, which refers to any number of
programming languages understood by particular computer CPUs. The
machine code would then run “on the metal” (though one might see, if
one looks closely enough, that the “metal” works a lot like an
interpreter). Today, however, it’s very common to use a compiler to
generate object code that is meant to be interpreted—for example, this
is how Java used to (and sometimes still does) work. There are
compilers that translate other languages to JavaScript, which is then
often run in a web browser, which might interpret the JavaScript, or
compile it a virtual machine or native code. We also have interpreters
for machine code, which can be used to emulate one kind of hardware on
another. Or, one might use a compiler to generate object code that is
then the source code for another compiler, which might even compile
code in memory just in time for it to run, which in turn . . . you get
the idea. There are many ways to combine these concepts.
The biggest advantage of interpreted source code over compiled source code is PORTABILITY.
If your source code is compiled, you need to compile a different executable for each type of processor and/or platform that you want your program to run on (e.g. one for Windows x86, one for Windows x64, one for Linux x64, and so on). Furthermore, unless your code is completely standards compliant and does not use any platform-specific functions/libraries, you will actually need to write and maintain multiple code bases!
If your source code is interpreted, you only need to write it once and it can be interpreted and executed by an appropriate interpreter on any platform! It's portable! Note that an interpreter itself is an executable program that is written and compiled for a specific platform.
An advantage of compiled code is that it hides the source code from the end user (which might be intellectual property) because instead of deploying the original human-readable source code, you deploy an obscure binary executable file.
A compiler and an interpreter do the same job: translating a programming language to another pgoramming language, usually closer to the hardware, often direct executable machine code.
Traditionally, "compiled" means that this translation happens all in one go, is done by a developer, and the resulting executable is distributed to users. Pure example: C++.
Compilation usually takes pretty long and tries to do lots of expensive optmization so that the resulting executable runs faster. End users don't have the tools and knowledge to compile stuff themselves, and the executable often has to run on a variety of hardware, so you can't do many hardware-specific optimizations. During development, the separate compilation step means a longer feedback cycle.
Traditionally, "interpreted" means that the translation happens "on the fly", when the user wants to run the program. Pure example: vanilla PHP. A naive interpreter has to parse and translate every piece of code every time it runs, which makes it very slow. It can't do complex, costly optimizations because they'd take longer than the time saved in execution. But it can fully use the capabilities of the hardware it runs on. The lack of a separrate compilation step reduces feedback time during development.
But nowadays "compiled vs. interpreted" is not a black-or-white issue, there are shades in between. Naive, simple interpreters are pretty much extinct. Many languages use a two-step process where the high-level code is translated to a platform-independant bytecode (which is much faster to interpret). Then you have "just in time compilers" which compile code at most once per program run, sometimes cache results, and even intelligently decide to interpret code that's run rarely, and do powerful optimizations for code that runs a lot. During development, debuggers are capable of switching code inside a running program even for traditionally compiled languages.
First, a clarification, Java is not fully static-compiled and linked in the way C++. It is compiled into bytecode, which is then interpreted by a JVM. The JVM can go and do just-in-time compilation to the native machine language, but doesn't have to do it.
More to the point: I think interactivity is the main practical difference. Since everything is interpreted, you can take a small excerpt of code, parse and run it against the current state of the environment. Thus, if you had already executed code that initialized a variable, you would have access to that variable, etc. It really lends itself way to things like the functional style.
Interpretation, however, costs a lot, especially when you have a large system with a lot of references and context. By definition, it is wasteful because identical code may have to be interpreted and optimized twice (although most runtimes have some caching and optimizations for that). Still, you pay a runtime cost and often need a runtime environment. You are also less likely to see complex interprocedural optimizations because at present their performance is not sufficiently interactive.
Therefore, for large systems that are not going to change much, and for certain languages, it makes more sense to precompile and prelink everything, do all the optimizations that you can do. This ends up with a very lean runtime that is already optimized for the target machine.
As for generating executables, that has little to do with it, IMHO. You can often create an executable from a compiled language. But you can also create an executable from an interpreted language, except that the interpreter and runtime is already packaged in the exectuable and hidden from you. This means that you generally still pay the runtime costs (although I am sure that for some language there are ways to translate everything to a tree executable).
I disagree that all languages could be made interactive. Certain languages, like C, are so tied to the machine and the entire link structure that I'm not sure you can build a meaningful fully-fledged interactive version
This is one of the biggest misunderstood things in computer science as I guess.
Because interpretation and compilation are completely two different things, which we can't compare in this way.
The compilation is the process of translating one language into another language. There are few types of compilations.
Compiling - Translate high-level language into machine/byte code (ex: C/C++/Java)
Transpiling - Translate high-level language into another high-level language (ex: TypeScript)
Interpretation is the process of actually executing the program. This may happen in few different ways.
Machine level interpretation - This interpretation happens to the code which is compiled into machine code. Instructions are directly interpreted by the processor. Programming languages like C/C++ generate machine code, which is executable by the processor. So the processor can directly execute these instructions.
Virtual machine level interpretation - This interpretation happens to the code which is not compiled into the machine level (processor support) code, but into some intermediate-level code. This execution is done by another software, which is executed by the processor. At this time actually processor doesn't see our application. It just executing the virtual machine, which is executing our application. Programming languages like Java, Python, C# generate a byte code, which is executable by the virtual interpreter/machine.
So at the end of the day what we have to understand is, all the programming languages in the world should be interpreted at some time. It may be done by a processor(hardware) or a virtual machine.
The compilation is just the process of bringing the high-level code we write that is human-understandable into some hardware/software machine-understandable level.
These are completely two different things, which we can't compare. But that terminology is pretty much good to teach beginners how programming languages work.
PS:
Some programming languages like Java have a hybrid approach to do this. First, compile the high-level code into byte code which is virtual-machine readable. And on the fly, a component called the JIT compiler compiles byte-code into machine code. Specifically, code lines that are executed again and again many times are get translated into the machine language, which makes the interpretation process much faster. Because hardware processor is always much faster than virtual interpreter/processor.
How Java JIT compiler works
It's rather difficult to give a practical answer because the difference is about the language definition itself. It's possible to build an interpreter for every compiled language, but it's not possible to build an compiler for every interpreted language. It's very much about the formal definition of a language. So that theoretical informatics stuff noboby likes at university.
The Python Book © 2015 Imagine Publishing Ltd, simply distunguishes the difference by the following hint mentioned in page 10 as:
An interpreted language such as Python is one where the source code is converted to machine code and then executed each time the program runs. This is different from a compiled language such as C, where the source code is only converted to machine code once – the resulting machine code is then executed each time the program runs.
Compile is the process of creating an executable program from code written in a compiled programming language. Compiling allows the computer to run and understand the program without the need of the programming software used to create it. When a program is compiled it is often compiled for a specific platform (e.g. IBM platform) that works with IBM compatible computers, but not other platforms (e.g. Apple platform).
The first compiler was developed by Grace Hopper while working on the Harvard Mark I computer. Today, most high-level languages will include their own compiler or have toolkits available that can be used to compile the program. A good example of a compiler used with Java is Eclipse and an example of a compiler used with C and C++ is the gcc command. Depending on how big the program is it should take a few seconds or minutes to compile and if no errors are encountered while being compiled an executable file is created.check this information
Short (un-precise) definition:
Compiled language: Entire program is translated to machine code at once, then the machine code is run by the CPU.
Interpreted language: Program is read line-by-line and as soon as a line is read the machine instructions for that line are executed by the CPU.
But really, few languages these days are purely compiled or purely interpreted, it often is a mix. For a more detailed description with pictures, see this thread:
What is the difference between compilation and interpretation?
Or my later blog post:
https://orangejuiceliberationfront.com/the-difference-between-compiler-and-interpreter/