Difference between C++ and Java compilation process [duplicate] - java

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Why does C++ compilation take so long?
Hi,
I searched in google for the differences between C++ and Java compilation process, but C++ and Java language features and their differences are returned.
I am proficient in Java, but not in C++. But I fixed few bugs in C++. From my experience, I noticed that C++ always took more time to build compared to Java for minor changes.
Regards
Bala

There are a few high-level differences that come to my mind. Some of those are generalizations and should be prefixed with "Often ..." or "Some compilers ...", but for the sake of readability I'll leave that out.
C/C++ compilation doesn't read any information from binary files, but reads method/type definitions only from header files that need to be parsed in full (exception: precompiled headers)
C/C++ compilation includes a pre-processor step that can do a wide array of text-replacement (which makes header pre-compilation harder to do)
The C++ syntax is a lot more complex than the Java syntax
The C++ type system is a lot more complex than the Java type system
C++ compilation usually produces native assembler code, which is a lot more complex to produce than the relatively simple byte code
C++ compilers need to do optimizations because there isn't any other thing that will do them. The Java compiler pretty much does a simple 1:1 translation of Java source code to Java byte code, no optimizations are done at that step (that's left for the JVM to do).
C++ has a template language that's Turing complete! (so strictly speaking C++ code needs to be run to produce executable code and a C++ compiler would need to solve the halting problem to tell you if arbitrary C++ code is compilable).

Java compiles code into bytecode, which is interpreted by the Java VM. C++ must compile into object code, then to machine language. Because of this, it's possible for Java to compile only a single class for minor changes, while C++ object files must be re-linked with other object files to machine code executable (or DLLs). This may make the process take a bit longer.

I am not sure why you expect the compilation speed of Java and C++ to be comparable since they are different languages with completely different design goals and implementations.
That said a few specific differences to keep in mind are:
Java is compiled to byte code and not right down to machine code. Compiling to this abstract virtual machine is simpler.
C++ compilation involves not only compilation but also linking. So it is typically a multi step process.
Java performs some late binding that is the association of a call to a function and the actual code to run is done at runtime. So a small change in one area need not trigger a compile of the whole program. In C++ this association needs to be done at compile time this is called early binding.

A C++ program using all the language's features is inherently more difficult to compile. A few template invocations with a number of types can easily double or triple the amount of code to generate.

Glossing over a lot of details, in Java you compile .java files into one or more .class files. In C++ you compile .cc (or whatever) source files into .o files, and then link the .o files together into an executable or library. The linking process is usually what kills you, especially for minor changes as the amount of work for linking is roughly proportional to the size of your entire project. (this is ignoring incremental linkers, which are specifically designed to not behave as badly for small changes)
Another factor is that the #include mechanism means that whenever you change a .h file, all of the .o files that depend on it need to be rebuilt. In Java, a .class file can depend on more than one .java file (eg: because of constant inlining), but there tend to be far fewer of these "hot spots" where changing one source file requires many other source files to be rebuilt.
Also, if you're using an IDE like Eclipse it's building your Java code in the background all the time, so by the time you tell it to build it's already mostly (if not completely) done.

Java compiles any source code into bytecode, which is interpreted by JVM. Because of this feature it can be used in multiple platform.

Related

How to compile Java files that put .class files directly into JAR

First some reference:
1st Link
2nd link
The first article 1st Link mentions about compiling the Java files directly into JAR files and avoiding one step in the build process. Does anyone know this?
-Vadiraj
As you linked to my blog post I thought it was only fair to give you an update.
Compiling directly to a Jar is actually fairly simple to do. Basically you extend
javax.tools.ForwardingJavaFileObject
Then override openOutputStream method and direct it to your Jar. As the Java Compiler is highly concurrent but writing to a jar file is highly sequential I'd recommend that you buffer to byte arrays and then have a background thread that writes the byte arrays as they arrive.
I do exactly this is my experimental build tool JCompilo https://code.google.com/p/jcompilo/
This tool is completely undocumented but crazy fast. Currently it's about 20-80% faster than any other Java build tool and about 10x faster than the Scala compiler for the same code.
As the author is talking about extending the compiler itself, it is possible that he has knowledge of the built-in capabilities of the compiler (that is what the compiler is capable of, maybe with a little encouragement by tweaking the code).
Right now I’m investigating extending the Java 6 compiler to remove the unneeded file exists checks and possible jaring the class files directly in the compiler. [emphasis mine]
That capability, however, is certainly not supported officially (no documentation exist about it on the javac webpage).
At best, the feature is compiler dependent; possibly requiring modification of the compiler's source code.

Typical build process of Java application

Being new to Java, I am not able to understand the complete build process from source code to hardware specific binaries. Basically I am using Eclipse for java and would like to know what all conversions takes place from source code(.java) to binary, what all files are linked using some linker, preprocessor etc etc.
Shall appreciate if you can point me to some link giving detail of complete build process for java. I have already searched this forum but did not get detail info.
Thanks
Edited:
So to be more precise I am looking for java equivalent of following build process in C:
I googled a lot but no gain! A figure like the following is not a must(though preferred), but if you can write 'n' sequential/parallel steps involved in complete Java build process, that will be really appreciated. Though much of the information provided by #Tom Anderson is very useful to me.
The first thing to appreciate is that your question contains a mistaken assumption. You ask about "the complete build process from source code to hardware specific binaries" - but the normal Java build process never produces architecture-specific binaries. It gets as far as architecture-independent bytecode, then stops. It's certainly true that in most cases, that bytecode will be translated into native code to be executed, but that step happens at runtime, inside the JVM, entirely in memory, and does not involve the production of binary files - it is not part of the build process.
There are exceptions to this - compilers such as GCJ can produce native binaries, but that is rarely done.
So, the only substantial step that occurs as part of a build process is compilation. The compiler reads in source code, does the usual parsing and resolution steps, and emits bytecode. That process is not in any way specified; as is usual, the language specification defines what the elements of the language are, and what they mean, but not how to compile them. What is specified in the format of the output: the bytecode is packaged in the form of class files, one per class, which in turn may be grouped together in jar files for ease of distribution.
When the class files come to be executed, there are then further steps needed before execution is possible. These are quite well-specified in the chapter on loading, linking, and initializing in the JVM specification. But, as i said, these are not really part of the build process.
There are a few other steps that may occur in a build process, usually prior to compilation: dependencies might be resolved and downloaded, resources might be copied and converted between character sets, and code might be generated. But none of this is standard, it's all stuff that's added on to the core process of compilation by various build tools.
There are some cool articles you can checkout if you want to know what's going on "behind the scenes".
http://www.codeproject.com/Articles/30422/How-the-Java-Virtual-Machine-JVM-Works
This is one of them, it has a good explanation on how all the parts interact to run your code.
The main idea is that the bytecode is created from your Java files to run in a Virtual Machine, making your Java code (more or less...) independent of the OS and platform you're running it on.
The JVM, specific to that environment, is then responsible for translating that bytecode into actual instructions for the specific architecture you're running your code on.
This is the basis of the "Write once, run everywhere" mantra that Java has. Although the mantra doesn't always hold... it's still true in general.

Programming an Interpreter for a Compiler

I'm writing an interpreter for a compiler program in Java. So after checking the source code, syntax and semantics, I want to be able to run the source code, which is the input for my compiler. I'm just wondering if I can just translate some tokens, for example, out (it prints stuff on screen), can I just replace it with System.out.print? then feed the source code again to run it in java?
I've heard of using the Java Compiler API, would this be a good plan?
Thank you very much in advance!
What you asking is a virtual machine implementation technique, to run your Java code in general you should implement following:
The first few steps I guess you already done (Design/describe the language semantics, construct AST and perform required validation of the code)
You need to generate your byte code, original Java works exactly in the same way, it generates another representation of the source code, from human readable to machine readable.
Here you can see how Java byte code looks like http://www.ibm.com/developerworks/ibm/library/it-haggar_bytecode/
You need to implement virtual aka stack machine that reads byte code and runs it for execution.
So as you can see you should have 3 separated components (projects) for your task:
1. Language grammar
2. Compiler (byte code generator)
3. Virtual machine (interpreter of byte code)
P.S. I have experience in creation of tiny Java similar compiler from scratch (define grammar with ANTlr, implementation of compiler, implementation of virtual machine), so probably I can share more information with you (even source code) if you need something particular
You really need to read some books and/or take courses on compilers - this can't be solved by a two-paragraph answer on SO.
You could create a cross-compiler which reads your language and outputs Java code to do the same thing. This may be the simplest option.
The Java Compiler API can be used to compile Java code. You would need to translate your existing code to Java first to use it.
This would not be the same thing as writing an interpreter. Is this homework? Does the task say you have to write the interpreter or can you have the code run any way which works?
Unfortunately you did not mention which scripting language are you planning to support. If it is one of well known languages, just use its ready interpreter written in pure java. See BSF and Java 5 scripting (http://www.ibm.com/developerworks/java/library/j-javascripting1/)
It it is your own language
think twice: do you really need it?
If you are sure you need your own language think about JavaCC
First of all, thank you very much for the fast replies.
As part of our compiler project, we need to be able to compile and run a program written in our own specified language. The language is very similar to C. I am confused on how an interpreter works, is there a simpler way to implement this? Without generating byte codes? My idea was to translate each statement into Java equivalent statements, and let Java handle the byte code generation.
I would look into the topics mentioned. Again, thank you very much for the suggestions.

Combining Java and C without gcj -- move C to Java or Java to C?

First, I have no experience doing this. But like the beginning of any good program, I have problem that I need to fix, so I'm willing to learn.
So many of you are probably already familiar with pdftk, the handy utility for handling various pdf-related tasks. So far as I can tell, most of these features are available in much newer, lighter libraries/extensions, except the one I need (and probably the only reason it still exists): merging form data files (fdf and xfdf) with a form PDF and getting a new file as the output.
The problem is that my server doesn't have gcj, which is fundamental to build/compile pdftk. I don't know if it's because I'm on Solaris or if it's for some other sysadmin-level reason, but I'm not getting gcj anytime soon. And there are no pre-compiled binaries for Solaris as far as I can find.
So I'm thinking that the MAKE file and C code can be rewritten to import the Java library (very ancient version of itext) directly, via javac.
But I'm not sure where to really start. All I know is:
I want a binary when I'm done, so that there won't be a need for a Java VM on every use.
The current app uses GCJ.
So my first thought was "Oh this is easy, I can probably just call the classes with some other C-based method", but instead of finding a simple method for doing this, I'm finding tons of lengthy posts on the various angles that this can be approached, etc.
Then I found a page on Sun's site on how to call other languages (like C) in a Java class. But the problems with that approach are:
I'd have to write a wrapper for the wrapper
I'd probably be better off skipping that part and writing the whole thing in Java
I ain't ready for that just yet if I can just import the classes with what is already there
I'm not clear on if I can compile and get a binary at the end or if I'm trapped in Java being needed every time.
Again, I apologize for my ignorance. I just need some advice and examples of how one would replace GCJ dependent C code with something that works directly with Java.
And of course if I'm asking one of those "if we could do that, we'd be rich already" type questions, let me know.
I'm not sure what you are looking for exactly, so I provided several answers.
If you have java code that needs to run, you must:
Run it in a jvm. You can start that vm within your own custom c-code, but it is still using a jvm
Rewrite it in another language.
Compile with an ahead-of-time compiler (eg gcj)
Incidentally, you could compile a copy of gcj in your home folder and use that. I believe the magic switch is --enable-languages=java,c (see: here for more)
If you have c-code you want to call from java, you have four options:
Java Native Interface (JNI). It seems you found this
Java Native Access (JNA). This is slower than JNI, but requires less coding and no wrapper c-code. It does require a jar and a library
Create a CLI utility and use Runtime.Exec(...) to call it.
Use some sort of Inter Process Communication to have the Java code ask the c-code to perform the operation and return the result.
Additional platform dependent options
Use JACOB (win32 only: com access)
I am not sure if I understand what you are looking for.
If you are looking to incorporate the C code into Java to make a native binary without the gcj, I think you are out of luck. You can include the C in Java, but it would be a primarily Java program meaning you would need the JVM on each run. Is there anything stopping you from compiling the gcj yourself?

Compiler to translate Java bytecode to platform-independent C code before runtime?

I'm looking for a compiler to translate Java bytecode to platform-independent C code before runtime (Ahead-of-Time compilation).
I should then be able to use a standard C compiler to compile the C code into an executable for the target platform. I understand this approach is suitable only for certain Java applications that are modified infrequently.
So what Java-to-C compilers are available?
I could suggest a tool called JCGO which is a Java source to C translator. If you need to convert bytecode then you can decompile the class files by some tool (e.g., JadRetro+Jad) and pass the source files to JCGO. The tool translates all the classes of your java program at once and produces C files (one .c and .h for each class), which could, further, be compiled (by third-party tools) into highly-optimized native code for the target platform. Java generics is not supported yet. AWT/Swing and SWT are supported.
Why do that? The Java virtual machine includes a runtime Java-to-assembly compiler.
Compilation at runtime can yield better performance, since all information about runtime values is available. While ahead-of-time compilation has to take assumptions about runtime values and thus may emits less fast code. Please refer to Java vs C performance by Cliff Click for more details.
GCJ has this capability, but it hasn't got great support for Java features past 1.4, and Swing support is likely to be troublesome. In practice though, the HotSpot JIT compiler beats all the ahead-of-time compilers for Java. See benchmarks from Excelsior JET.
To clarify: GCJ converts java source/bytecode to natively compiled code
Toba will convert (old) Java bytecode to C source. However, it hasn't been updated since Java 1.1. It may be helpful to partially facilitate the porting, but it just can't handle all the complex libraries Java has.
https://github.com/badlogic/jack -- Java to C++ transpiler, ignores memory model and other stuff, uses Boehm GC for extra slowness and GC pauses
The license is unclear to me.
http://ptolemy.eecs.berkeley.edu/publications/papers/03/java-2-C/ -- A Retargetable Optimizing Java-to-C Compiler for Embedded Systems
A paper, not sure whether the program is available.
(I've been googling for this stuff, this is how I came to this question at SO.)
AFAIK, there is no such product but you have two options:
Implement your own byte-code to C transpiler. Byte-code is pretty simple, this isn't too hard.
If you just want a native binary (i.e. when you don't need the C source code), then give GCJ a try.
Note: If you're doing this for performance reasons, then you're going to be disappointed. Java is generally as fast as C/C++. Moreover, improvements to the VM will make all Java code faster but not your native binary. Compiling the code will just give you a little better startup time.
Not really an answer to my own question, but how does Oracle do it?
http://download.oracle.com/docs/cd/B28359_01/java.111/b31225/chone.htm#BABCIHGA
There used to be a product called TowerJ, which was essentially a "via C" static compiler for Java, but it is long gone.
I was told that Sun Labs has created something like this as part of the Sun SPOT project, but I am not sure if it is public.
#BobMcGee: In the benchmarks you refer to, GCJ indeed loses, but Excelsior JET (which is a 32-bit AOT compiler) beats the 32-bit HotSpot on all three test systems, so I am not sure what was your point.
But, after all, there are lies, damn lies, and benchmarks. :)

Categories

Resources