Identical Java sources compile to binary differing classes

Identical Java sources compile to binary differing classes - java

Can anyone explain how identical Java sources can end up compiling to binary differing class files?
The question arises from the following situation:
We have a fairly large application (800+ classes) which has been branched, restructured then reintegrated back into the trunk. Prior to reintegration, we merged the trunk into the branch, which is standard procedure.
The end result was a set of directories with the branch sources and a set of directories with the trunk sources. Using Beyond Compare we were able to determine that both sets of sources were identical. However, on compiling (same JDK using maven hosted in IntelliJ v11) we noticed that about a dozen or so of the class files were different.
When we decompiled the source for each pair of apparently different class files we ended up with the same java source, so in terms of the end result, it doesn't seem to matter. But why is it that just a few of the files are different?
Thanks.
Additional thought:
If maven/javac compiles files in a different sequence, might that affect the end result?

Assuming that the JDK versions, build tool versions, and build / compilation options are identical, I can still think of a number of possible sources of differences:
Timestamps - class files may1 contain compilation timestamps. Unless you run the compilations at exactly the same times, different compilations of the same file would result different timestamps.
Source filename paths - each class file includes the pathname of the source file. If you compile two trees with different pathnames the class files will contain different source pathnames.
Values of imported compile-time constants - when a class A uses a compile-time constant defined in another class B (see JLS for the definition of a "compile time constant"), the value of the constant is incorporated into As class file. So if you compile A against different versions of B (with different values for the constants), the code of A is likely to be different.
Differences due to identityHashcode being used in HashMap keys by the compiler could lead to differences in map iteration order in some step. This could affect .class file generation in a way that is not significant, but still shows up as a .class file difference. For example, constant pool entries could end up in a different order.
Differences in signatures of external classes / methods; e.g. if you changed a dependency version in one of your POM files.
Differences in the effective build classpaths might result in differences in the order in which imported classes are found. This might in turn result in non-significant differences in the order of entries in the class file's Constant Pool. This could happen due to things such as:
files appearing in different order in the directories of external JAR files,
files being compiled in different order due to the source files being in different order when your build tool iterates them2, or
parallelism in the build (if you have that enabled).
There is a possible workaround for the problem with file ordering: use the undocumented -XDsortfiles option as described in JDK-7003006. (Kudos to #Holger for knowing about that.)
Note that you don't normally see the actual order of files in file system directories. Commandline tools like ls and dir, and file browsers will typically sort the entries (in name or timestamp order) before displaying them.
1 - This is compiler dependent. Also, it is not guaranteed that javap will show the timestamps ... if they are present.
2 - The OS gives no guarantees that listing a directory (at the syscall level) will return the file system objects in a deterministic order ... or the same order, if you have removed and re-added files.
I should add that the first step to identifying the cause of the differences is to work out exactly what they are. You probably need (needed) to do that the hard way - by manually decoding a pair of class files to identify the places where they actually differences ... and what the differences actually mean.

When you compare using beyond compare, comparision is done based on contents of the files. But in the build process just the timestamp of the source files are checked for change. So it your source file's lastmodified date changes it will be recompiled.

Different JDK produce different binary classes (optimizations, but also class version number). There are compilation options, too (a JDK may compile in an older format, or it can add debug information).

Different versions of Java can add different meta data which is often ignored by a decompiler.
I suggest you try using javap -c -v for more of the details in a file. If this doesn't help you can use the ASMifierClassVisitor which looks at every byte.

same JDK can also have different output depending on how you compile.
you can compile with or without debug info, you can compile to run in an older version, each option will result in other classes.

Related

How many Classpath can be specified on Java command line?

I have to run a java task, with a very large number of classpath (1000, totaling 150k characters if concatenated).
The problem is that java returns an error when I try to execute this class:
/jdk/JAVA8/bin/java: Argument list too long
The error code is 7
I've tried to put the classpaths using "export CLASSPATH=CLASSPATH:....." and so I shouldn't specify them through the -cp java parameter, but it returned the same error.
I'm pretty sure that the problem revolves round a classpath's limit, because if I delete some of the classpath, the error disappears (but then I will have logical errors in the execution, because I need all the classpaths)

You could use classpath wildcards. Especially if many of your jars/class files are in the same directory, this would help a lot.
It could be environment variable size limit or command-line size limit as well rather than javac classpath arg limit.
javac takes arguments from file input as well. You can add all your arguments to this file and pass this file argument to command. Refer this for more.

You didn’t hit a java-specific limitation, but a system dependent limit. This is best illustrated by the fact, that the attempt to set the CLASSPATH variable fails as well, but setting an environment variable via export name=value in the shell isn’t related to Java.
As said by others, you could try to use wildcards for jar files within the same directory, but you have to care that Java does the expansion rather than the shell, as in the latter case, it would again yield a too long command line. So you have to escape the * character to ensure it will not be processed by the shell.
javac supports reading the command line arguments from an external file specified via #filename, but unfortunately, the java launcher doesn’t support this option.
An alternative would be to create symbolic links pointing to the jar files, having shorter paths and specifying these. You could even combine the approaches by creating one directory full of symbolic links and specifying that/directory/* as class path.
But there seems to be a logical error in the requirement. In a comment, you are mentioning “code analysis” and an analyzing tool should not require having the code to analyze in its own application class path. You can access class files via ordinary I/O, if you want to read and parse them. In case you want to load them, e.g. for using the builtin Reflection, you can create new ClassLoader instances pointing to the locations. So the tool doesn’t depend on the application class path and could read the locations from a configuration file, for example.
Using distinct class loaders has the additional advantage that you can close them when you’re done.

JVM does not limit classpath length. However, there is a hard OS limit on command line length and environment variables size.
On Linux check getconf ARG_MAX to see the limit.
On older kernel versions it is only 128KB. On newer kernels it is somewhere around 2MB.
If you want to set really long classpaths, you may need a JAR-Manifest trick. See this question for details.

How the flavor-specific variants are working?

If you need flavor you should go to build gradle and add the flavors that you need
Like this
productFlavors {
mock {
applicationIdSuffix = ".mock"
}
prod {}
}
and then you need to create corresponding dir like this /src/prod/java/
How I thought it should work, according to build variant that was choosen for example prodDebug androidStudio will take as a base main source and substitute coresponding classes from dir according to choosen build variant.
But then I found this snippet which said next
Files in the flavor-specific folders do not replace files in the main source set. Trying to do that will result in a duplicate class exception. This is a common misconception because it's how resources are merged.

Ok, so with basic configuration with flavors, you have two kinds of source sets:
main source set
flavor-specific source sets, like your mock and prod
With standard buildTypes configuration (debug and release), this gives you the following build variants (combinations of build types and product flavors):
mockDebug
mockRelease
prodDebug
prodRelease
Each one of them uses every source set that corresponds with flavor/type name and the main set, so for example, the prodRelease will use all of the following source sets at once:
/src/main
/src/prod
/src/release
Effectively, the build system will 'merge' all of these into one source set, and that means if there are classes with the same path and name in these sets, a name clash occurs and compiler will fail.
The way to use source sets correctly is to omit the class that you need to be different for each set from the main set, but instead provide it with all the sets for each flavor / each buildType, for example:
main set has class A.java that references class B.java. B.java is omitted from main set.
Different B.java files are included in mock and prod sets (of course, don't need to be different, but need to provide the same interface, preferably with interface included in main set).
Compiler uses B.java from the set that is being used by the current configuration - build variant, so either the mock or the prod one.
Yay! Now you have two functionally different product flavors.
This behavior doesn't limit to classes, you can use flavor or type specific resources, AndroidManifest.xml files and just about anything that goes into the source dir.
Tip: In Android Studio you can see in the 'project files' section which files will be chosen to compile for a specific variant. To switch build variants, hit Cmd+Shift+A (mac keymap) and search for Build Variants phrase. It usually shows also on the left bottom side of the Android Studio window.

The code from the main source set will always make it into the APK. The source files in other source sets will only be merged if the correct build variant is used. For example, you can create two files:
src/mock/java/yourpackage/MyClass.java
src/prod/java/yourpackage/MyClass.java
Depending on whether you're building prod or mock variant, one of those classes will be compiled and packaged with the APK. Same works for debug and release: you can have code and resources that are only packaged into debug or release versions of the app.

Java: Class files containing source code?

I was inspecting the class file format since I wanted to add source code to the class file (which was possible in early Java versions) but all I found was a SourceFile attribute and the SourceDebug attribute. I was looking for the complete source code of the class to be bundled with the class file to ease the post-processing pipeline.
Does anyone know if my memories are wrong or how I can bundle the complete source code of a class within the class file so that I do not have to look up for the java-file when I want to check the source code?
Is there a compiler switch to do that?
Javac has a -g option adding additional debug information. Can someone tell me whats are the information it adds? Without the -g switch it generates lines of code index and source file information.
The main problem I have is generate a class file but only have a reference to a source file that might change. I want simply to bundle up source and class file.
In maven I can simply copy over all the source files to the target directory but would might be incompatible with Eclipse, IntelliJ and NetBeans IDE (and what not)... .

Using a decompiler will also provide a way to extract a useful representation of the source code since most decompiler will value the lines of code information and position the decompiled structures accordingly within the source code.
Since some scenarios will require access to comments and a correct representation on a char by char level, the decompiler would be a second rate solution.

One possible solution I found is defining a new class-file attribute (which is legal) that contains the source. Since the source is huge when compared to the class file, the content might be best compressed (yielding a 1:5 to 1:10 ratio).
This way the class file and the sources stay bundled.
The JVM specification guarantees that every JVM/Tool has to ignore unknown attributes.
I will invest into a wrapper of javac application, that ensures the source was not modified during compilation (and if yes, redo the compilation process) and after compilation is done adding the source code as a class-file attribute.
Since this will be incompatible to the IDE-build cycle of Eclipse (and most likely IntelliJ and NetBeans) it will also require a special post processor.
So integration will also require alternatives to the JavaBuilder.
Once the source code is attached to the class file in question it is very easy to do a lot of advanced stuff with it that helps with both maintaining and managing code. For me its important that the source code and a class stay together and the source information is a 100% percent equal to the source code it was compiled from.

Where does HP Fortify put the intermediate files?

According to the HP Fortify documentation, the Static Code Analyzer first translates the source code into an intermediate format, and then it scans the translated code and generates a vulnerability report.
It says the translation can be done using the following Ant code:
<antcall target="compile">
<param name="build.compiler" value="com.fortify.dev.ant.SCACompiler"/>
</antcall>
This will call your "compile" target but force it to use the SCACompiler instead of the regular javac compiler.
I have run Fortify on our Java code and it produces vulnerability reports. But I do not see the intermediate files anywhere. I ran a diff between the Java class files that the regular javac compiler produced and the Java class files that the SCACompiler produced, and they were exactly the same. Are the intermediate files stored somewhere else, or does Fortify automatically delete them after performing the scan?

The intermediate files are not class or object files. They are NST (Normalized Syntax Tree) files, a proprietary format used by HP Fortify (this is discussed in the book "Secure Programming with Static Analysis". When translating with a build ID, such as:
sourceanalyzer -b test ant
Then it will be stored in the project working directory. In Windows, typically:
%USERPROFILE%\AppData\Local\Fortify\sca<version>\build\test
or on other platforms:
~/.fortify/sca<version>/build/test
this will then contain the canonicalized path to the NST, as was performed during the translation. These can then be used to scan multiple times if needed, but should be "cleaned" if scanning a separate new (or updated) codebase.
For ant integration I think it depends on which version of Ant, and the way you are translating, but this way I think it just calls the sourceanalyzer.jar file (which contains the com.fortify.dev.ant.SCACompiler class) in order to hook into the JVM and follow the build to create the NST files needed for scanning. I don't believe it's actually a separate version of javac, although perhaps there is a separate version under <SCA installation directory>/jre/ which it may use.

Lavamunky is correct about the default path for the working directory. You can change this in the following locations:
1. FortifyInstallRoot\Core\config\fortify.properties: com.fortify.WorkingDirectory
2. FortifyInstallRoot\Core\config\fortify-sca.properties: com.fortify.sca.ProjectRoot
Note that you need to use / as the path delimiter instead of \ inside of the config files. Inside of the folder specified by those paths, the pattern is: sca\build\.
You can also specify these at runtime:
sourceanalyzer -b MyBuild -Dcom.fortify.WorkingDirectory=C:\Fortify\Work -Dcom.fortify.sca.ProjectRoot=C:\Fortify\Work
The path to the working files would then be:
C:\Fortify\Work\sca<version>\build\MyBuild\

Classloading problems for the same java code, two .class files?

If I have a java class and:
- I compile the class and include it in a jar, A
- compile separately the same class and include it in a different jar, B
(I know it's not politically right to do this...etc)
(the compilation is done against the same jdk, on the same machine)
If I put these two jars in the same war - can I get class loading problems?

Two ways to get into trouble:
Have two externally different classes by the same name, such that other classes that are compiled against one will not be valid referencing the second.
Have two identical copies of the class (or even the same copy) and manage (through one of several means) to load it twice with two different class loaders.
But having the same (from an external attributes standpoint) class twice in your classpath is not a problem -- the first one in the JAR search order will always be loaded.

No. You'll simply get the first copy it finds. If they're in the same package, you'll effectively never see that other class.
And it's not "politically" wrong to do this. It's fundamentally a bug.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Identical Java sources compile to binary differing classes - java

When you compare using beyond compare, comparision is done based on contents of the files. But in the build process just the timestamp of the source files are checked for change. So it your source file's lastmodified date changes it will be recompiled.

Different JDK produce different binary classes (optimizations, but also class version number). There are compilation options, too (a JDK may compile in an older format, or it can add debug information).

Different versions of Java can add different meta data which is often ignored by a decompiler. I suggest you try using javap -c -v for more of the details in a file. If this doesn't help you can use the ASMifierClassVisitor which looks at every byte.

same JDK can also have different output depending on how you compile. you can compile with or without debug info, you can compile to run in an older version, each option will result in other classes.

Related

How many Classpath can be specified on Java command line?

How the flavor-specific variants are working?

Java: Class files containing source code?

Where does HP Fortify put the intermediate files?

Classloading problems for the same java code, two .class files?

Categories

Resources