How JVM distinguish between Scala bytecode and Java bytecode? - java

As Scala also produces bytecode and executed by JVM. I am wondering How JVM distinguish between Scala bytecode and Java bytecode. Can anyone please explain?
Scalac Myprogram.scala
java Myprogram
So this statements are perfectly fine?

I am wondering How JVM distinguish between Scala bytecode and Java bytecode.
It doesn't. There is no such thing as Scala bytecode. The Scala compiler compiles to JVM bytecode. Just like the Java compiler also compiles to JVM bytecode.
The JVM doesn't know anything about Scala. It doesn't know anything about Java, either. Nor does it know anything about Groovy, Clojure, Kotlin, Ceylon, Fantom, Ruby, Python, ECMAScript, or any of the other ~400 programming languages for which there are implementations on the JVM.
The JVM only knows about one language: JVM bytecode.
Note that this is really no different from any other machine, virtual or not. The CLR only knows about CIL, it knows nothing about C#, VB.NET, or F#. An Intel Core CPU knows only about AMD64 and x86 machine code, it knows nothing about C, C++, Objective-C, Swift, Go, Java, Python. The CPython VM only knows about CPython bytecode, it knows nothing about Python.

It doesn't. Scala compiles to the same bytecode as Java.

A picture is worth a thousand words

Both scalac and javac generate bytecode. The JVM doesn't care how the bytecode was produced, it's all the same to the JVM.
However, scala and java sets up the boot CLASSPATH differently, so if your code contains Scala Runtime Library calls, and it very likely will, it needs to be run by scala, not java.
You can setup the boot CLASSPATH manually using java, if you absolutely have to, but why go through all that extra work, when scala will do it for you?

Scala compiles to normal Java bytecode, so the JVM doesn't seem any difference. The extra features of Scala that Java doens't have are implemented through a combination of compile time passes and runtime helper functions. If you disassemble generated Scala classes, you'll probably see tons of calls to the Scala runtime for stuff like boxing and unboxing arguments.

The byte code as defined in the standard is the same in the point of view of the executor. But when you work with a debugger, more information is needed to match the correct language and source files.
The only way to indicate what is the source code from which the JVM bytecode has been generated is through optional class attributes SourceFile (from Java 1.0.2) and SourceDebugExtension (from Java 5.0).
https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.7.10 and 11
An additional information will give the line numbers of the various objects in the source LineNumberTable, this also is optional.
https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.7.12
You can find the equivalent for other JVM specs.
These are used by IDEs to get the context of JVM class, but are optional values and debug information depends on the compiler.
So it can be an info to know what the compiler was, but you cannot rely on it as sure info. If it's defined, it should be in the compiler definition.
With scalac you can switch the information storage with the -g option, the same as javac except the notc as Java doesn't manage TCO:
g:{none,source,line,vars,notc}
"none" generates no debugging info,
"source" generates only the source file attribute,
"line" generates source and line number information,
"vars" generates source, line number and local variable information,
"notc" generates all of the above and will not perform tail call optimization.
By default line
To get a ClassLoader to retrieve info from the class files you can use the Open Source library ASM : http://asm.ow2.org/
You can also parse the class file or internal storage by yourself, it is not very difficult retrieving class attributes.

Related

Writing languages for the JVM

Suppose I write a programming language; for namesake, I'll call it lang.
To begin the long journey of writing lang, I decide to begin, by writing lang in itself. I can't actually run it, because theres nothing to run the program that runs itself.
So I begin by writing another compiler for lang in Java. This time, when I am done, I decide to convert it to Bytecode, and leave it at that. I now have a working compiler, which will convert all my lang code into Bytecode.
So I decide to plug in my self-compiler for the language, into the compiler I just made in Java. I then convert the self-compiler to Bytecode, and chuck out the Java compiler. I now have a lang compiler, purely written in itself, converted into Bytecode, ready for use.
This creates a solid program, and I understand all of this, but my question is, relative to compiler design for the JVM, what if I decide to release an update for my language? How do I go about updating the Bytecode? Do I simply re-write the updated version of the language in the older one?
I ask this because this is what I want to do. Write a non-existing language in itself, and then bootstrap it to the JVM by firstly creating a compiler in Java.
It's the same as what was done with C++. C with Classes was written, and then C++ in it, and finally C with Classes was abandoned for the bootstrapped C++. But then how on earth did they ever go about updating the language?
I'll answer this from two possible scenarios in your development. With any byte-code language at any time you can update the virtual machine or the language.
Suppose first you wanted to update your language to have new syntax or change the current semantics. Then you'd keep your current compiled compiler written in lang (compiler A) and edit its source so that it can correctly compile your new features. Then you compile your compiler using the old one giving you compiler B. If necessary, you can now rewrite the compiler to use the new features and then compile it using compiler B to give you compiler C.
What if the JVM changes? Well in that case you keep an old version of the JVM around, adjust your compiler to cope with the new bytecode changes, and then compile it with the old one (this is analogous to compiler B from before). That will get you a compiler that compiles to the new bytecode but runs on the old VM. The next step is get it to compile itself, and now you have a new compiler that runs on the new VM (analogous to compiler C).
I don't think your compiler is the best way to go about this.
I'd start with a grammar for my language.
Next comes the lexer/parser to turn expressions in my language to an abstract syntax tree (AST). The AST is a correct intermediate representation of the expression.
You would emit bytecode or assembly language instructions for the virtual machine or processor of your choice by writing a code generator that traverses the AST.
Where does your update happen?
If it's language fundamentals, you have to modify both the grammar and the bytecode emission.
If you're optimizing the bytecode or porting to a new processor you have to modify the code generator.
The first lang compiler can be written in a subset of lang. And you only need a subset (bootstrap) compiler (or even interoreter). This can be written in java.
Later, more extensive compilers can be written in lang. Newer versions can do too.
You could even write a translator that converts a lang program to java, and use that to create a first translator in lang, and then turn it into a bytecode compiler.

Are JVM implemented languages like Jython using Java underneath or are they using the JVM native?

In a language that uses the JVM, say Jython, JRuby or any language that isn't Java specifically, is Java the language being used "underneath" somewhere?
Does the implementation mean:
Language Ported to use the JVM + Java somewhere + JVM?
For example, was Jython written in Java or does it using something else to utilize the JVM?
It depends.
Part of language's standard library may be implemented in java. Ditto for the compiler/interpreter. Other parts that are not essential for bootstrapping may even be written in the language itself.
The user code itself initially may be run through an interpreter but later compiled to bytecode. Additionally the generated bytecode may be optimized further based on type profiles gathered during runtime. And the bytecode may bail out back to the interpreter if some of its assumptions are invalidated. This is analogous - albeit at a higher abstraction level - to hotspot's interpreter/c1/c2 tiers and many other JIT environments.
But interpreter+JIT is just one possible approach. Scala for example is AOT-compiled to bytecode.
And they may also use C bindings to implement functions of the respective language's standard library that have no 1:1 mapping to the JDK standard library.
No, they do not (in the general case) compile down to Java.
For example Jython compiles python to Java bytecode that is then run on the JVM. This is the same bytecode Java compiles to, as does JRuby. While it is true that both Jython and JRuby are themselves largely written in Java, the actual python/Ruby program being run is not compiled to Java but instantly to Java bytecode.

Difference between C++ and Java compilation process [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Why does C++ compilation take so long?
Hi,
I searched in google for the differences between C++ and Java compilation process, but C++ and Java language features and their differences are returned.
I am proficient in Java, but not in C++. But I fixed few bugs in C++. From my experience, I noticed that C++ always took more time to build compared to Java for minor changes.
Regards
Bala
There are a few high-level differences that come to my mind. Some of those are generalizations and should be prefixed with "Often ..." or "Some compilers ...", but for the sake of readability I'll leave that out.
C/C++ compilation doesn't read any information from binary files, but reads method/type definitions only from header files that need to be parsed in full (exception: precompiled headers)
C/C++ compilation includes a pre-processor step that can do a wide array of text-replacement (which makes header pre-compilation harder to do)
The C++ syntax is a lot more complex than the Java syntax
The C++ type system is a lot more complex than the Java type system
C++ compilation usually produces native assembler code, which is a lot more complex to produce than the relatively simple byte code
C++ compilers need to do optimizations because there isn't any other thing that will do them. The Java compiler pretty much does a simple 1:1 translation of Java source code to Java byte code, no optimizations are done at that step (that's left for the JVM to do).
C++ has a template language that's Turing complete! (so strictly speaking C++ code needs to be run to produce executable code and a C++ compiler would need to solve the halting problem to tell you if arbitrary C++ code is compilable).
Java compiles code into bytecode, which is interpreted by the Java VM. C++ must compile into object code, then to machine language. Because of this, it's possible for Java to compile only a single class for minor changes, while C++ object files must be re-linked with other object files to machine code executable (or DLLs). This may make the process take a bit longer.
I am not sure why you expect the compilation speed of Java and C++ to be comparable since they are different languages with completely different design goals and implementations.
That said a few specific differences to keep in mind are:
Java is compiled to byte code and not right down to machine code. Compiling to this abstract virtual machine is simpler.
C++ compilation involves not only compilation but also linking. So it is typically a multi step process.
Java performs some late binding that is the association of a call to a function and the actual code to run is done at runtime. So a small change in one area need not trigger a compile of the whole program. In C++ this association needs to be done at compile time this is called early binding.
A C++ program using all the language's features is inherently more difficult to compile. A few template invocations with a number of types can easily double or triple the amount of code to generate.
Glossing over a lot of details, in Java you compile .java files into one or more .class files. In C++ you compile .cc (or whatever) source files into .o files, and then link the .o files together into an executable or library. The linking process is usually what kills you, especially for minor changes as the amount of work for linking is roughly proportional to the size of your entire project. (this is ignoring incremental linkers, which are specifically designed to not behave as badly for small changes)
Another factor is that the #include mechanism means that whenever you change a .h file, all of the .o files that depend on it need to be rebuilt. In Java, a .class file can depend on more than one .java file (eg: because of constant inlining), but there tend to be far fewer of these "hot spots" where changing one source file requires many other source files to be rebuilt.
Also, if you're using an IDE like Eclipse it's building your Java code in the background all the time, so by the time you tell it to build it's already mostly (if not completely) done.
Java compiles any source code into bytecode, which is interpreted by JVM. Because of this feature it can be used in multiple platform.

Understanding Java Byte Code

Often I am stuck with a java class file with no source and I am trying to understand the problem I have at hand.
Note a decompiler is useful but not sufficient in all situation...
I have two question
What tools are available to view java byte code (preferably available from the linux command line )
What are good references to get familiar with java byte code syntax
Rather than looking directly at the Java bytecode, which will require familiarity with the Java virtual machine and its operations, one could try to use a Java decompiling utility. A decompiler will attempt to create a java source file from the specified class file.
The How do I “decompile” Java class files? is a related question which would be informative for finding out how to decompile Java class files.
That said, one could use the javap command which is part of the JDK in order to disassemble Java class files. The output of javap will be the Java bytecode contained in the class files. But do be warned that the bytecode does not resemble the Java source code at all.
The definite source for learning about the Java bytecode and the Java Virtual Machine itself would be The Java Virtual Machine Specification, Second Edition. In particular, Chapter 6: The Java Virtual Machine Instruction Set has an index of all the bytecode instructions.
To view bytecode instruction of class files, use the javap -v command, the same way as if you run a java program, specifying classpath (if necessary) and the class name.
Example:
javap -v com.company.package.MainClass
About the bytecode instruction set,
Instruction Set Summary
Fernflower is an analytical decompiler, so it will decompile classes to a readable java code instead of bytecodes. It's much more usefull when you want to understand how code works.
If you have a class and no source code, but you have a bug, you can do one of two basic things:
Decompile, fix the bug and recreate the jar file. I have done this before, but sysadmins are leery about putting that into production.
Write unit tests for the class, determine what causes the bug, report the bug with the unit tests and wait for it to be fixed.
(2) is generally the one that sysadmins, in my experience, prefer.
If you go with (2) then, in the meantime, since you know what causes the bug, you can either not allow that input to go to the class, to prevent a problem, or be prepared to properly handle it when the error happens.
You can also use AspectJ to inject code into the problem class and change the behavior of the method without actually recompiling. I think this may be the preferable option, as you can change it for all code that may call the function, without worrying about teaching everyone about the problem.
If you learn to read the bytecode instructions, what will you do to solve the problem?
I have two question
1) What tools are available to view java byte code (preferably available
from the linux command line )
The javap tool (with the -c option) will disassemble a bytecode file. It runs from the command line, and is supplied as part of the Java SDK.
2) What are good references to get familiar with java byte code syntax
The javap tool uses the same syntax as is used in the JVM specification, and the JVM spec is naturally the definitive source. I also spotted "Inside the Java Virtual Machine" by Bill Venners. I've never read it, and it looks like it might be out of print.
The actual (textual) syntax is simple and self explanatory ... assuming that you have a reference that explains what the bytecodes do, and that you are moderately familiar with reading code at this level. But it is likely to be easier to read the output of a decompiler, even if the bytecodes has been fed through an obfuscator.
You might find the Eclipse Byte Code Outline plugin useful:
http://andrei.gmxhome.de/bytecode/index.html
I have not used it myself - just seen it mentioned in passing.

Compiler to translate Java bytecode to platform-independent C code before runtime?

I'm looking for a compiler to translate Java bytecode to platform-independent C code before runtime (Ahead-of-Time compilation).
I should then be able to use a standard C compiler to compile the C code into an executable for the target platform. I understand this approach is suitable only for certain Java applications that are modified infrequently.
So what Java-to-C compilers are available?
I could suggest a tool called JCGO which is a Java source to C translator. If you need to convert bytecode then you can decompile the class files by some tool (e.g., JadRetro+Jad) and pass the source files to JCGO. The tool translates all the classes of your java program at once and produces C files (one .c and .h for each class), which could, further, be compiled (by third-party tools) into highly-optimized native code for the target platform. Java generics is not supported yet. AWT/Swing and SWT are supported.
Why do that? The Java virtual machine includes a runtime Java-to-assembly compiler.
Compilation at runtime can yield better performance, since all information about runtime values is available. While ahead-of-time compilation has to take assumptions about runtime values and thus may emits less fast code. Please refer to Java vs C performance by Cliff Click for more details.
GCJ has this capability, but it hasn't got great support for Java features past 1.4, and Swing support is likely to be troublesome. In practice though, the HotSpot JIT compiler beats all the ahead-of-time compilers for Java. See benchmarks from Excelsior JET.
To clarify: GCJ converts java source/bytecode to natively compiled code
Toba will convert (old) Java bytecode to C source. However, it hasn't been updated since Java 1.1. It may be helpful to partially facilitate the porting, but it just can't handle all the complex libraries Java has.
https://github.com/badlogic/jack -- Java to C++ transpiler, ignores memory model and other stuff, uses Boehm GC for extra slowness and GC pauses
The license is unclear to me.
http://ptolemy.eecs.berkeley.edu/publications/papers/03/java-2-C/ -- A Retargetable Optimizing Java-to-C Compiler for Embedded Systems
A paper, not sure whether the program is available.
(I've been googling for this stuff, this is how I came to this question at SO.)
AFAIK, there is no such product but you have two options:
Implement your own byte-code to C transpiler. Byte-code is pretty simple, this isn't too hard.
If you just want a native binary (i.e. when you don't need the C source code), then give GCJ a try.
Note: If you're doing this for performance reasons, then you're going to be disappointed. Java is generally as fast as C/C++. Moreover, improvements to the VM will make all Java code faster but not your native binary. Compiling the code will just give you a little better startup time.
Not really an answer to my own question, but how does Oracle do it?
http://download.oracle.com/docs/cd/B28359_01/java.111/b31225/chone.htm#BABCIHGA
There used to be a product called TowerJ, which was essentially a "via C" static compiler for Java, but it is long gone.
I was told that Sun Labs has created something like this as part of the Sun SPOT project, but I am not sure if it is public.
#BobMcGee: In the benchmarks you refer to, GCJ indeed loses, but Excelsior JET (which is a 32-bit AOT compiler) beats the 32-bit HotSpot on all three test systems, so I am not sure what was your point.
But, after all, there are lies, damn lies, and benchmarks. :)

Categories

Resources