Why does Java generate Multiple .class files on compilation? - java

In Java, on compilation we get a .class file for each class( including nested classes and interfaces) defined in the source file.What is the reason for this multiple .class file generation? Is it for simplifying the reusablity of the class? Why not generate one .class for one .java file?

The JVM needs to be able to find the code for a given class, given its name. If there's potentially no relationship between the source filename and the code filename, and you want the code filename to be based on the source filename, how would you expect it to load the code?
As an example: suppose I were to compile Foo.java which contains class Bar.
Another class then refers to Bar, so the JVM needs the code for it... how would you suggest it finds the file?
Note that in .NET there's a separate of unit of deployment called the assembly - and a reference to a type includes the assembly name as well, but that's slightly different from what you were proposing.

In response to #Jon Skeet's rhetorical question:
Another class then refers to Bar, so the JVM needs the code for it... how would you suggest it finds the file?
Suppose (hypothetically) that the Java classfile format represented nested / inner classes by embedding them in the classfile for the outermost class. The binary name for the Bar is "Lsome/pkg/Foo$Bar;". The class loader could split the name at the "$" character, use the first part to locate the classfile for Foo, and then navigate to the embedded Bar class representation.
I think that the real reason that inner/nested classes have separate classfiles is historical. IIRC, Java 1.0 did not support nested or inner classes, and hence the corresponding classfile formats did not need to deal with them. When Java 1.1 was created (supporting inner/nested classes), Sun wanted the classfile format to be compatible with the classfiles produced by the Java 1.0 compiler. So they chose to implement inner / nested classes as separate classfiles, using the reserved "$" character in the binary classname.
A second possible reason is that the flat format simplifies class loading compared to a hypothetical embedded format.
And finally, there was (and still is) no compelling reason for them NOT to use a flat file format. It maybe creates some minor head-scratching when some programmer wants to load inner classes using Class.forName() but that is pretty rare occurrence ... and the solution is straight-forward.

That's is a design decision regarding a compilation unit, made by the developers.
Compiled classes are usually combined in a jar file.
Extract from Java Language Spec
7.3 Compilation Units
CompilationUnit is the goal symbol (§2.1) for the syntactic grammar (§2.3) of
Java programs.
Types declared in different compilation units can depend on each other, circularly.
A Java compiler must arrange to compile all such types at the same time.

Related

Some comparison between Java package and C# namespace

From Programming Language Pragmatics 4ed by Michael Scott
C# follows Java’s lead in extracting header information automatically
from complete class definitions.
Then it continues to mention where namespaces in C# differ from packages in Java:
Its module-level syntax, however, is based
on the namespaces of C++, which allow a single file to contain
fragments of multiple namespaces.
Does Java allow a single file to contain fragments of multiple
packages?
There is also no notion of standard search
path in C#: to build a complete program, the programmer must provide
the compiler with a complete list of all the files required.
How does a C# programmer provide the compiler with a complete
list of all the files required?
Thanks.
Does Java allow a single file to contain fragments of multiple packages?
If there is a package statement, it must be the first line of your Java source code. This means that the answer to your question is "no": you can have at most one package declaration per Java source file.
How does a C# programmer must provide the compiler with a complete list of all the files required?
This applies only to building on the command line, because IDEs take care of this automatically. When you build your code on the command line with csc.exe you must provide a list of all files composing your module either by listing them one-by-one, e.g.
csc src\File1.cs src\File2.cs src\File3.cs
or by specifying a pattern:
csc src\*.cs

how can i find the files imported in a java class?

I don't get to see the source code but the .class file.
Can I still find out the files that are imported?
Keep in mind that imports are simply a convenience mechanism that lets the Java developer refer to a class using it's simple name (Date) rather than it's Fully Qualified Name (FQN - java.util.Date or java.sql.Date).
So if you run the .class file through a decompiler, you'll likely see references using the FQN and possibly no import statements.
Well, if you just want to do it manually, I suggest you take a look at a decompiler such as JD GUI
Otherwise, you need to go the reflection way if you want this information programmatically.
If you need to do this in batch, and do not want to bother with all the other details that a decompiler would offer, you can inspect the constant pool for class references.
Beware that, as previously mentioned, source imports are merely a convenience and do not correspond directly to anything in class files. Scanning the constant pool will not show unused imports from the source file, and it will not show classes used only for compile-time constants (public static final String ... and the like). It will show FQNs even for classes in the same package, and it will show classes referred to using FQN without an import. It will show classes whose signatures are used implicitly:
URL loc = Something.class.getProtectionDomain().getCodeSource().getLocation();
will produce references to ProtectionDomain and CodeSource in bytecode even though the source did not explicitly mention them.
https://hg.netbeans.org/core-main/raw-file/default/nbbuild/antsrc/org/netbeans/nbbuild/VerifyClassLinkage.java is an example of how to do this scan (see the dependencies method).

Java packages vs. C++ libraries

In Java, there is what is called package. Does library in C++ represent the same meaning, especially in terms for example of containg relative classes and the use of protected members?
Thanks.
There are different dimensions of what a package means in Java. As a container that differentiates the names of the classes inside from the names of classes in other packages, its equivalent would be c++ namespaces.
As a unit that guarantees access to non-private members to classes in the same block, there is no equivalent in C++. The access level granted to a class is independent of the namespace where the class is defined.
As a way of ordering your sources in the disk, there is no equivalent, the C++ language has no requirements on how the code is stored in files.
Regarding c++ libraries, that is closer to jar files in Java. They bundle different classes that share some relation. A jar can contain more than one package, and more than one jar can contain classes from the same package. Similarly with libraries, they can contain classes from different namespaces and/or different libraries can contain classes from the same namespace.
The closest to Java packages are namespaces in C++.
They can be nested into one another, and you need to specifically declare that you are using them or a part of their contents. However, they do not enforce any physical file hierarchy like Java packages do.
Strictly speaking I think that namespaces in C++ provide the same semantics.
I guess it is more related to namespaces in C++.
Java and C++ both use libraries. Library can be any independent set of classes[probably a framework] which can be accessed in our code.
External Libraries are there in both Java and C++. Just the formats vary, .jar in Java and .dll/.so in C++.
Purpose of Packages and Namespaces are different from Libraries. They avoid running out of names by allowing user to logically group the source.
A package in Java is a namespace for classes, interfaces and enums. Package name, a dot and the classname form the fully qualified classname of a class:
com.example.hello.HelloWorldApplication
^--packagename--^ ^-----classname-----^
The dots in the package name have a different meaning then the dot between the names: the first two dots of this example are part of the package name, the last one is a separator.
This should be kept in mind, because there's a common misunderstanding regarding package names: just because the names can be mapped to a hierarchical folder structure, some people think, package names have a hierarchy too - which is not the case: hello is not a "subpackage" of example!
But, to create a simple mapping to folders and files, a classloader can simply take the fully qualified class name, replace all dots with a slash and append .class to get a relative path to a class file.
But note again, that a folder/file mapping is not required t load classes - we can invent a class loader that gets classes from a database or a remote service - a folder/file mapping wouldn't make any sense in that case.

Why is each public class in a separate file?

I recently started learning Java and found it very strange that every Java public class must be declared in a separate file. I am a C# programmer and C# doesn't enforce any such restriction.
Why does Java do this? Were there any design considerations?
Edit (based on a few answers):
Why is Java not removing this restriction now in the age of IDEs? This will not break any existing code (or will it?).
I have just taken a C# solution and did just this (remove any file that had multiple public classes in them) and broke them out to individual files and this has made life much easier.
If you have multiple public classes in a file you have a few issues:
What do you name the file? One of the public classes? Another name? People have enough issues around poor solution code organization and file naming conventions to have one extra issue.
Also, when you are browsing the file / project explorer its good that things aren't hidden. For example you see one file and drill down and there are 200 classes all mushed together. If you have one file one class, you can organize your tests better and get a feel for the structure and complexity of a solution.
I think Java got this right.
According to the Java Language Specification, Third Edition:
This restriction implies that there must be at most one such type per compilation unit. This restriction makes it easy for a compiler for the Java programming language or an implementation of the Java virtual machine to find a named class within a package; for example, the source code for a public type wet.sprocket.Toad would be found in a file Toad.java in the directory wet/sprocket, and the corresponding object code would be found in the file Toad.class in the same directory.
Emphasis is mine.
It seems like basically they wanted to translate the OS's directory separator into dots for namespaces, and vice versa.
So yes, it was a design consideration of some sort.
From Thinking in Java
:
There can be only one public class per compilation unit (file).
The idea is that each compilation unit has a single public interface represented by that public class. It can have as many supporting “friendly” classes as you want. If you have more than one public class inside a compilation unit, the compiler will give you an error message.
From the specification (7.2.6)
When packages are stored in a file system (?7.2.1), the host system may choose to enforce the restriction that it is a compile-time error if a type is not found in a file under a name composed of the type name plus an extension (such as .java or .jav) if either of the following is true:
The type is referred to by code in other compilation units of the package in which the type is declared.
The type is declared public (and therefore is potentially accessible from code in other packages).
This restriction implies that there must be at most one such type per compilation unit.
This restriction makes it easy for a compiler for the Java programming language or an implementation of the Java virtual machine to find a named class within a package; for example, the source code for a public type wet.sprocket.Toad would be found in a file Toad.java in the directory wet/sprocket, and the corresponding object code would be found in the file Toad.class in the same directory.
In short: it may be about finding classes without having to load everything on your classpath.
Edit: "may choose" seems like it leaves the possibility to not follow that restriction, and the meaning of "may" is probable the one described in RFC 2119 (i.e. "optional")
In practice though, this is enforced in so many platform and relied upon by so many tools and IDE that I do not see any "host system" choosing to not enforce that restriction.
From "Once upon an Oak ..."
It's pretty obvious - like most things are once you know the design reasons - the compiler would have to make an additional pass through all the compilation units (.java files) to figure out what classes were where, and that would make the compilation even slower.
(Note:
the Oak Language Specification for Oak version 0.2 (postcript document): Oak was the original name of what is now commonly known as Java, and this manual is the oldest manual available for Oak (i.e. Java).
For more history on the origins of Java, please have a look at the Green Project and Java(TM) Technology: An Early History
)
It's just to avoid confusion in the sense that Java was created with simplicity in mind from the perspective of the developer. Your "primary" classes are your public classes and they are easy to find (by a human) if they are in a file with the same name and in a directory specified by the class's package.
You must recall that the Java language was developed in the mid-90s, in the days before IDEs made code navigation and searching a breeze.
If a class is only used by one other class, make it a private inner class. This way you have your multiple classes in a file.
If a class is used by multiple other classes, which of these classes would you put into the same file? All three? You would end up having all your classes in a single file...
That's just how the language designers decided to do it. I think the main reason was to optimize the compiler pass-throughs - the compiler does not have to guess or parse through files to locate the public classes. I think it's actually a good thing, it makes the code files much easier to find, and forces you to stay away from putting too much into one file. I also like how Java forces you to put your code files in the same directory structure as the package - that makes it easy to locate any code file.
It is technically legal to have multiple Java top level classes in one file. However this is considered to be bad practice (in most cases), and some Java tools may not work if you do this.
The JLS says this:
When packages are stored in a file
system (§7.2.1), the host system may
choose to enforce the restriction that
it is a compile-time error if a type
is not found in a file under a name
composed of the type name plus an
extension (such as .java or .jav) if
either of the following is true:
The type is referred to by code in other compilation units of the package in which the type is declared.
The type is declared public (and therefore is potentially accessible from code in other packages).
Note the use of may in the JLS text. This says that a compiler may reject this as invalid, or it may not. That is not a good situation if you are trying to build your Java code to be portable at the source code level. Thus, even if multiple classes in one source file works on your development platform, it is bad practice to do this.
My understanding is that this "permission to reject" is a design decision that is intended in part to make it easier to implement Java on a wider range of platforms. If (conversely) the JLS required all compilers to support source files containing multiple classes, there would be conceptual issues implementing Java on a platform which wasn't file-system based.
In practice, seasoned Java developers don't miss being able to do this at all. Modularization and information hiding are better done using an appropriate combination of packages, class access modifiers and inner or nested classes.
Why is java not removing this restriction now in the age of IDEs? This will not break any existing code (or will it?).
Now all code is uniform. When you see a source file you know what to expect. it is same for every project. If java were to remove this convention you have to relearn code structure for every project you work on, where as now you learn it once and apply it everywhere. We should not be trusting IDE's for everything.
Not really an answer to the question but a data point none the less.
I grepped the headers of my personal C++ utilty library (you can get it yourself from here) and almost all of the header files that actually do declare classes (some just declare free functions) declare more than one class. I like to think of myself as a pretty good C++ designer (though the library is a bit of a bodge in places - I'm its only user), so I suggest that for C++ at least, multiple classes in the same file are normal and even good practice.
It allows for simpler heuristics for going from Foobar.class to Foobar.java.
If Foobar could be in any Java file you have a mapping problem, which may eventually mean you have to do a full scan of all java files to locate the definition of the class.
Personally I have found this to be one of the strange rules that combined result in that Java applications can grow very large and still be sturdy.
Well, actually it is an optional restriction according to Java Language Specification (Section 7.6, Page No. 209) but followed by Oracle Java compiler as a mandatory restriction. According to Java Language Specification,
When packages are stored in a file system (§7.2.1), the host system
may choose to enforce the restriction that it is a compile-time error
if a type is not found in a file under a name composed of the type
name plus an extension (such as .java or .jav) if either of the
following is true:
The type is referred to by code in other compilation units of the package in which the type is declared.
The type is declared public (and therefore is potentially accessible from code in other packages).
This restriction implies that there must be at most one such type per
compilation unit. This restriction makes it easy for a Java compiler
to find a named class within a package.
In practice, many programmers choose to put each class or interface
type in its own compilation unit, whether or not it is public or is
referred to by code in other compilation units.
For example, the source code for a public type wet.sprocket.Toad would
be found in a file Toad.java in the directory wet/sprocket , and the
corresponding object code would be found in the file Toad.class in the
same directory.
To get more clear picture let's imagine there are two public classes public class A and public class B in a same source file and A class have reference to the not yet compiled class B. And we are compiling (compiling-linking-loading) class A now while linking to class B compiler will be forced to examine each *.java files within the current package because class B don’t have it’s specific B.java file. So In above case, it is a little bit time consuming for the compiler to find which class lies under which source file and in which class the main method lies.
So the reason behind keeping one public class per source file is to actually make compilation process faster because it enables a more efficient lookup of the source and compiled files during linking (import statements). The idea is if you know the name of a class, you know where it should be found for each classpath entry and no indexing will be required.
And also as soon as we execute our application JVM by default looks for the public class (since no restrictions and can be accessible from anywhere) and also looks for public static void main(String args[]) in that public class. Public class acts as the initial class from where the JVM instance for the Java application (program) is begun. So when we provide more than one public class in a program the compiler itself stops you by throwing an error. This is because later we can’t confuse the JVM as to which class to be its initial class because only one public class with the public static void main(String args[]) is the initial class for JVM.
You can read more on Why Single Java Source File Can Not Have More Than One public class

Why classes compile to .class but interface does not to .interface

Is there any specific reason why interfaces are not compiled into MyInterface.java compiled into .interface file?But any class is compiled into .class file.!
Because the point is to indicate that the file is Java byte code (and .class was the extension chosen for that), not the specific language construction.
Java treats interfaces almost like classes, eg they share the same namespace (you can't have an interface that has the same name as a class) and a compiled interface is almost identical to a compiled abstract class.
So it would not make any sense to store them in a different format or with a different file extension. On the contrary, this would make many things harder. For example, when you load a class or interface by name (Class.forName("my.class.name")) Java does not know whether it is a class or an interface. If there would be two different extensions, Java would have try to find a file "my/class/name.class" and then "my/class/name.interface", instead of only trying the first one.
The physical representation of the byte-code on the file system doesn't matter.
It's the logical realization (whether class or interface) that matters.
That's the way the language designers decided.
It makes sense in several ways:
.class files are a byproduct that you don't normally see or manipulate by hand.
The less different extensions a program uses, the easier it is to maintain.
In many cases, there's no distinction in the code between a class and an interface, so it's logical that the binary files look alike.
Frankly, I can't think of a good reason to have different extensions for compiled classes and interfaces. Why would it be important to distinguish between them?
In java, you have source files, called .java and binaries called .class. Its just a choice of naming.
Also for java classes and interface's don't differ that much (a class just contains a lot of extra information like method bodies).
It is just a choice they made. I wouldn't bother about it. It is a binary file anyway. One way to think is "Even it is an interface it is still in a file.java".

Categories

Resources