When you compile your java files, does it also embed your javadocs and comments into the class file?
For example, if you have large javadocs, does it effect the overall size of your class file? Or does the compiler ignore everything beginning with // and /* ?
No, comments are not compiled into your class files. This includes JavaDocs.
Instead, you need to use a JavaDoc tool (like Sun/Oracle's) on the source code to generate the documentation.
No, the class file is just binary data.
Annotations may be retained (depending on the annotation).
Comments won't affect the size of the class file.
No. There are several debug options that affect the size of a class file but the comments are never part of the resulting .class file.
Some estimate:
-g:line just adds line number information (a few bytes)
-g:vars includes the full names of all variables. This is usually the most expensive option.
-g:source just adds the name of the source file (without path).
Note: -parameters makes names of method parameter accessible via reflection. This is independent of -g:vars.
Comments (and therefore JavaDoc) are never added to the bytecode.
To see what ends up in the .class file, use javap -v plus the path of the file.
Related
I have a pile of .java files. They all have the same class name public MyClass. They all have a main method. They all may or may not have a package declaration at top, and I do not know ahead of time.
I am trying to write a script to compile and run these java programs. This is easy for the files without the package declaration... I just do some cp operations to setup, javac MyClass.java and java MyClass, then rm to teardown. However, the files with the package declaration require special attention. I have a few options that occur to me, including deleting the package lines, or attempting to read the package lines so that I know what the resulting directory structure should be. Both of these require me to go parsing through the .java files, which makes me sad.
Is there a way to compile and run these files without having to parse the .java files? Something like:
javac --ignore_package_structure MyClass.java
would be ideal, but a quick look at the javac man pages suggests that such a thing doesn't exist.
If we can assume that each student submits a single source file named HelloWorld.java, then we can use the "Launch Single-File Source-Code Programs" feature added by JEP 330 in Java 11:
java HelloWorld.java
We don't run javac first, we don't get a .class file (no cleanup needed), and any package declaration is handled automatically.
Remember, the students are still allowed to use many classes, they just all have to be submitted to you in a single source file.
The name of the class doesn't even matter. The first class in the source file is executed.
There isn't any easy way to do this. You could use regex though, and replace all imports with this simple java regex:
"package \w+;"g
Simply stated, you create a Java program to replace all the package names.
How to replace files: Find and replace words/lines in a file
I have some tool creating an options file for Javadoc containing lots of individual Java source files to process. That tool simply adds all Java files automatically and allows me to add additional options files to the process created manually. The goal is to use such an additional options file to make Javadoc ignore some of the explicitly defined source files.
The first automatically generated options file looks like the following:
-classpath '[...]'
-d '[...]'
-doctitle '[...]'
'C:\\Users\\[...]\\package-info.java'
'C:\\Users\\[...]\\[...].java'
[...]
It contains a lot more paths for each individual Java-file in my project of course. The tools then invokes Javadoc the following way, where all but the first options file is the one I have control over:
javadoc #optsFile1 #optsFile2 #optsFile3
So, is it possible at all to somehow override the explicit paths of the first file using some options in the later files only?
I already tried various combinations of -exclude and -subpackages, but none of them worked. Javadoc always seems to process the explicitly defined files of the first and as well outputs their HTML. I don't care about unnecessary processing those files as well, I only don't want all of their HTML in the output folder. Would be great to have some option to post-filter things based on package names, because I would like to avoid to deal with paths.
Thanks!
I was inspecting the class file format since I wanted to add source code to the class file (which was possible in early Java versions) but all I found was a SourceFile attribute and the SourceDebug attribute. I was looking for the complete source code of the class to be bundled with the class file to ease the post-processing pipeline.
Does anyone know if my memories are wrong or how I can bundle the complete source code of a class within the class file so that I do not have to look up for the java-file when I want to check the source code?
Is there a compiler switch to do that?
Javac has a -g option adding additional debug information. Can someone tell me whats are the information it adds? Without the -g switch it generates lines of code index and source file information.
The main problem I have is generate a class file but only have a reference to a source file that might change. I want simply to bundle up source and class file.
In maven I can simply copy over all the source files to the target directory but would might be incompatible with Eclipse, IntelliJ and NetBeans IDE (and what not)... .
Using a decompiler will also provide a way to extract a useful representation of the source code since most decompiler will value the lines of code information and position the decompiled structures accordingly within the source code.
Since some scenarios will require access to comments and a correct representation on a char by char level, the decompiler would be a second rate solution.
One possible solution I found is defining a new class-file attribute (which is legal) that contains the source. Since the source is huge when compared to the class file, the content might be best compressed (yielding a 1:5 to 1:10 ratio).
This way the class file and the sources stay bundled.
The JVM specification guarantees that every JVM/Tool has to ignore unknown attributes.
I will invest into a wrapper of javac application, that ensures the source was not modified during compilation (and if yes, redo the compilation process) and after compilation is done adding the source code as a class-file attribute.
Since this will be incompatible to the IDE-build cycle of Eclipse (and most likely IntelliJ and NetBeans) it will also require a special post processor.
So integration will also require alternatives to the JavaBuilder.
Once the source code is attached to the class file in question it is very easy to do a lot of advanced stuff with it that helps with both maintaining and managing code. For me its important that the source code and a class stay together and the source information is a 100% percent equal to the source code it was compiled from.
I think I am failing to understand java package structure, it seemed redundant to me that java files have a package declaration within, and then are also required to be present in a directory that matches the package name. For example, if I have a MyClass.java file:
package com.example;
public class MyClass {
public static void main(String[] args) {
System.out.println("Hello, World");
}
}
Then I would be required to have this file located in com/example, relative to the base directory, and I would execute java com.example.MyClass from the base directory to run it.
Why wouldn't the compiler be able to infer the package name by looking at the directory structure? For example, if I compiled the file from the base directory javac com\example\MyClass.java, I am not understanding why the MyClass.java wouldn't implicity belong to the com.example package.
I understand there is a default package, but it still seems that the package declaration in the source file is redundant information?
As you (implicitly) acknowledged, you are not required to declare the name of a package in the case of the default package. Let us put that quibble aside ...
The reason for this seeming redundancy is that without a package declaration, the meaning of Java1 source code would be ambiguous. For example, a source file whose pathname was "/home/steve/project/src/com/example/Main.java" could have 7 different fully qualified names, depending on how you compiled the code. Most likely, only one of those will be the "correct" one. But you wouldn't be able to tell which FQN is correct by looking at (just) the one source file.
It should also be noted that the Java language specification does not require you to organize the source code tree according to the packages. That is a requirement of a (large) family of Java compilers, but a conformant compiler could be written that did not require this. For example:
The source code could be held in a database.
The source code could be held in a file tree with random file names2.
In such eventualities, the package declaration would not be duplicative of file pathnames, or (necessarily) of anything. However, unless there was some redundancy, finding the correct source "file" for a class would be expensive for the compiler ... and problematic for the programmer.
Considerations like the above are the practical reason that most Java tool chains rely on file tree structure to locate source and compiled classes.
1 - By this, I mean hypothetical dialect of Java which didn't require package declarations.
2 - The compiler would need to scan the file tree to find all Java files, and parse them to work out which file defined which class. Possible, but not very practical.
Turn the question on its head:
Assume that the package statement is the important thing - It represents the namespace of the class and belongs in the class file.
So now the question is - Why do classes have to be in folders that match their package?
The answer is that it makes finding them much easier - it is just a good way to organize them.
Does that help?
You have to keep in mind that packages do not just indicate the folder structure. The folder structure is the convention Java adopted to match the package names, just like the convention that the class name must match the filename.
A package is required to disambiguate a class from other classes with the same name. For instance java.util.Date is different from java.sql.Date.
The package also gives access to methods or members which are package-private, to other classes in the same package.
You have to see it the other way round. The class has all the information about itself, the class name and the package name. Then when the program needs it, and the class is not loaded yet, the JVM knows where to look for it by looking at the folder structure that matches the package name and the class with the filename matching its class name.
In fact there's no such obligation at all.
Oracle JDKs javac (and I believe most other implementations too) will happily compile your HelloWorld class, no matter what directory it is in and what package you declare in the source file.
Where the directory structure comes into the picture is when you compile multiple source files that refer to each other. At this point the compiler must be able to look them up somehow. But all it has in the source code is the fully qualified name of the referred class (which may not even have been compiled yet).
At runtime the story is similar: when a class needs to be loaded, its fully qualified name is the starting point. Now the class loader's job is to find a .class file (or an entry in a ZIP file, or any other imaginable source) based on the FQN alone, and again the simplest thing in a hierarchical file system is to translate the package name into a directory structure.
The only difference is that at runtime your "standalone" class too has to be loaded by the VM, therefore it needs to be looked up, therefore it should be in the correct folder structure (because that's how the bootstrap class loader works).
Can anyone explain how identical Java sources can end up compiling to binary differing class files?
The question arises from the following situation:
We have a fairly large application (800+ classes) which has been branched, restructured then reintegrated back into the trunk. Prior to reintegration, we merged the trunk into the branch, which is standard procedure.
The end result was a set of directories with the branch sources and a set of directories with the trunk sources. Using Beyond Compare we were able to determine that both sets of sources were identical. However, on compiling (same JDK using maven hosted in IntelliJ v11) we noticed that about a dozen or so of the class files were different.
When we decompiled the source for each pair of apparently different class files we ended up with the same java source, so in terms of the end result, it doesn't seem to matter. But why is it that just a few of the files are different?
Thanks.
Additional thought:
If maven/javac compiles files in a different sequence, might that affect the end result?
Assuming that the JDK versions, build tool versions, and build / compilation options are identical, I can still think of a number of possible sources of differences:
Timestamps - class files may1 contain compilation timestamps. Unless you run the compilations at exactly the same times, different compilations of the same file would result different timestamps.
Source filename paths - each class file includes the pathname of the source file. If you compile two trees with different pathnames the class files will contain different source pathnames.
Values of imported compile-time constants - when a class A uses a compile-time constant defined in another class B (see JLS for the definition of a "compile time constant"), the value of the constant is incorporated into As class file. So if you compile A against different versions of B (with different values for the constants), the code of A is likely to be different.
Differences due to identityHashcode being used in HashMap keys by the compiler could lead to differences in map iteration order in some step. This could affect .class file generation in a way that is not significant, but still shows up as a .class file difference. For example, constant pool entries could end up in a different order.
Differences in signatures of external classes / methods; e.g. if you changed a dependency version in one of your POM files.
Differences in the effective build classpaths might result in differences in the order in which imported classes are found. This might in turn result in non-significant differences in the order of entries in the class file's Constant Pool. This could happen due to things such as:
files appearing in different order in the directories of external JAR files,
files being compiled in different order due to the source files being in different order when your build tool iterates them2, or
parallelism in the build (if you have that enabled).
There is a possible workaround for the problem with file ordering: use the undocumented -XDsortfiles option as described in JDK-7003006. (Kudos to #Holger for knowing about that.)
Note that you don't normally see the actual order of files in file system directories. Commandline tools like ls and dir, and file browsers will typically sort the entries (in name or timestamp order) before displaying them.
1 - This is compiler dependent. Also, it is not guaranteed that javap will show the timestamps ... if they are present.
2 - The OS gives no guarantees that listing a directory (at the syscall level) will return the file system objects in a deterministic order ... or the same order, if you have removed and re-added files.
I should add that the first step to identifying the cause of the differences is to work out exactly what they are. You probably need (needed) to do that the hard way - by manually decoding a pair of class files to identify the places where they actually differences ... and what the differences actually mean.
When you compare using beyond compare, comparision is done based on contents of the files. But in the build process just the timestamp of the source files are checked for change. So it your source file's lastmodified date changes it will be recompiled.
Different JDK produce different binary classes (optimizations, but also class version number). There are compilation options, too (a JDK may compile in an older format, or it can add debug information).
Different versions of Java can add different meta data which is often ignored by a decompiler.
I suggest you try using javap -c -v for more of the details in a file. If this doesn't help you can use the ASMifierClassVisitor which looks at every byte.
same JDK can also have different output depending on how you compile.
you can compile with or without debug info, you can compile to run in an older version, each option will result in other classes.