Understanding Java Byte Code - java

Often I am stuck with a java class file with no source and I am trying to understand the problem I have at hand.
Note a decompiler is useful but not sufficient in all situation...
I have two question
What tools are available to view java byte code (preferably available from the linux command line )
What are good references to get familiar with java byte code syntax

Rather than looking directly at the Java bytecode, which will require familiarity with the Java virtual machine and its operations, one could try to use a Java decompiling utility. A decompiler will attempt to create a java source file from the specified class file.
The How do I “decompile” Java class files? is a related question which would be informative for finding out how to decompile Java class files.
That said, one could use the javap command which is part of the JDK in order to disassemble Java class files. The output of javap will be the Java bytecode contained in the class files. But do be warned that the bytecode does not resemble the Java source code at all.
The definite source for learning about the Java bytecode and the Java Virtual Machine itself would be The Java Virtual Machine Specification, Second Edition. In particular, Chapter 6: The Java Virtual Machine Instruction Set has an index of all the bytecode instructions.

To view bytecode instruction of class files, use the javap -v command, the same way as if you run a java program, specifying classpath (if necessary) and the class name.
Example:
javap -v com.company.package.MainClass
About the bytecode instruction set,
Instruction Set Summary

Fernflower is an analytical decompiler, so it will decompile classes to a readable java code instead of bytecodes. It's much more usefull when you want to understand how code works.

If you have a class and no source code, but you have a bug, you can do one of two basic things:
Decompile, fix the bug and recreate the jar file. I have done this before, but sysadmins are leery about putting that into production.
Write unit tests for the class, determine what causes the bug, report the bug with the unit tests and wait for it to be fixed.
(2) is generally the one that sysadmins, in my experience, prefer.
If you go with (2) then, in the meantime, since you know what causes the bug, you can either not allow that input to go to the class, to prevent a problem, or be prepared to properly handle it when the error happens.
You can also use AspectJ to inject code into the problem class and change the behavior of the method without actually recompiling. I think this may be the preferable option, as you can change it for all code that may call the function, without worrying about teaching everyone about the problem.
If you learn to read the bytecode instructions, what will you do to solve the problem?

I have two question
1) What tools are available to view java byte code (preferably available
from the linux command line )
The javap tool (with the -c option) will disassemble a bytecode file. It runs from the command line, and is supplied as part of the Java SDK.
2) What are good references to get familiar with java byte code syntax
The javap tool uses the same syntax as is used in the JVM specification, and the JVM spec is naturally the definitive source. I also spotted "Inside the Java Virtual Machine" by Bill Venners. I've never read it, and it looks like it might be out of print.
The actual (textual) syntax is simple and self explanatory ... assuming that you have a reference that explains what the bytecodes do, and that you are moderately familiar with reading code at this level. But it is likely to be easier to read the output of a decompiler, even if the bytecodes has been fed through an obfuscator.

You might find the Eclipse Byte Code Outline plugin useful:
http://andrei.gmxhome.de/bytecode/index.html
I have not used it myself - just seen it mentioned in passing.

Related

java - Compile code on client side without JDK

I have a question which I'm pretty confused from.
I am aware of the differences between Java Runtime Enviroment and Java Developement Kit.
I'm writing a program that uses the ToolProvider.getSystemJavaCompiler() method to compile java code from within the code.
Now, I've been answered that I can't compile code from client side if my client doesn't have JDK installed. My main question is, how can I do that? I don't want my clients having to install JDK on their computer just to run my program.
Thanks in advance!
You need to compile it on your system, and distribute the class file of corresponding java source file to anyone.
That class file doesn't require JDK but JRE must be installed on that system to run the class file.
If you want to compile code, you need a compiler, so if the user can't be expected to have the compiler you need, you'll simply have to bundle it.
I really can't say I know how to bundle the standard javac compiler, though it's probably possible, strictly speaking, to find the Jar file that contains it and bundle that along with your code. No idea how robust such a solution would be, though.
But depending on your needs, you may not need the standard javac. There are tons of byte-code generation libraries out there, with more or less high-level functionality. I wouldn't really want to recommend anything that I have no personal experience with, but examples include Byte Buddy or ASM. You could probably use ABCL too.
Eclipse's compiler is worth a look as well.
There is also an so question here.
So there really is no way to do what it is you are wanting to do unless you bundle the compiler itself with you application, or unless you find a library that has all of the Java compiler code in it already so it doesn't have to use the JDK compiler, you will not get what you want, and what you want is the ability to turn a String containing source code into a Java class.
I do not understand what you wish to accomplish, but the BEST option I can give you is asm. If you are up for the task, you can manually write new classes at runtime without the presence of the JDK compiler. HOWEVER, this does not involve you using a String full of source code and turning it into a Class object. This is you working at the low level with the Java bytecode for the most part.
This tutorial can get you started:
https://www.javaworld.com/article/2071777/design-patterns/add-dynamic-java-code-to-your-application.html
And here is the Java documentation for class files. You can use this to expand on what you learned from the first link:
https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html
That is the only instance creating classes on the fly that I can give you. That being said, you could try writing your own Java compiler that can turn source code into classes without ever getting the Java compiler, but at that point you are literally recreating the Java compiler yourself, and I assure you that is no easy feat for one person.

How JVM distinguish between Scala bytecode and Java bytecode?

As Scala also produces bytecode and executed by JVM. I am wondering How JVM distinguish between Scala bytecode and Java bytecode. Can anyone please explain?
Scalac Myprogram.scala
java Myprogram
So this statements are perfectly fine?
I am wondering How JVM distinguish between Scala bytecode and Java bytecode.
It doesn't. There is no such thing as Scala bytecode. The Scala compiler compiles to JVM bytecode. Just like the Java compiler also compiles to JVM bytecode.
The JVM doesn't know anything about Scala. It doesn't know anything about Java, either. Nor does it know anything about Groovy, Clojure, Kotlin, Ceylon, Fantom, Ruby, Python, ECMAScript, or any of the other ~400 programming languages for which there are implementations on the JVM.
The JVM only knows about one language: JVM bytecode.
Note that this is really no different from any other machine, virtual or not. The CLR only knows about CIL, it knows nothing about C#, VB.NET, or F#. An Intel Core CPU knows only about AMD64 and x86 machine code, it knows nothing about C, C++, Objective-C, Swift, Go, Java, Python. The CPython VM only knows about CPython bytecode, it knows nothing about Python.
It doesn't. Scala compiles to the same bytecode as Java.
A picture is worth a thousand words
Both scalac and javac generate bytecode. The JVM doesn't care how the bytecode was produced, it's all the same to the JVM.
However, scala and java sets up the boot CLASSPATH differently, so if your code contains Scala Runtime Library calls, and it very likely will, it needs to be run by scala, not java.
You can setup the boot CLASSPATH manually using java, if you absolutely have to, but why go through all that extra work, when scala will do it for you?
Scala compiles to normal Java bytecode, so the JVM doesn't seem any difference. The extra features of Scala that Java doens't have are implemented through a combination of compile time passes and runtime helper functions. If you disassemble generated Scala classes, you'll probably see tons of calls to the Scala runtime for stuff like boxing and unboxing arguments.
The byte code as defined in the standard is the same in the point of view of the executor. But when you work with a debugger, more information is needed to match the correct language and source files.
The only way to indicate what is the source code from which the JVM bytecode has been generated is through optional class attributes SourceFile (from Java 1.0.2) and SourceDebugExtension (from Java 5.0).
https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.7.10 and 11
An additional information will give the line numbers of the various objects in the source LineNumberTable, this also is optional.
https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.7.12
You can find the equivalent for other JVM specs.
These are used by IDEs to get the context of JVM class, but are optional values and debug information depends on the compiler.
So it can be an info to know what the compiler was, but you cannot rely on it as sure info. If it's defined, it should be in the compiler definition.
With scalac you can switch the information storage with the -g option, the same as javac except the notc as Java doesn't manage TCO:
g:{none,source,line,vars,notc}
"none" generates no debugging info,
"source" generates only the source file attribute,
"line" generates source and line number information,
"vars" generates source, line number and local variable information,
"notc" generates all of the above and will not perform tail call optimization.
By default line
To get a ClassLoader to retrieve info from the class files you can use the Open Source library ASM : http://asm.ow2.org/
You can also parse the class file or internal storage by yourself, it is not very difficult retrieving class attributes.

What does the 'require "java"' statement do in JRuby scripts?

I read about Java interoperability in Ruby, so using JRuby is an obvious choice. But somehow I don't really grasp the idea behind require 'java'. The documentation says:
... will give you access to any bundled Java libraries (classes within your java class path). However, this will not give you access to any non-bundled libraries.
Are there any more elaborated explanations?
To be more precise I don't understand why the following code works without require "java":
$ export CLASSPATH=".:lib/opennlp-tools-1.6.0.jar"
$ jruby -e 't = Java::OpennlpToolsTokenize::SimpleTokenizer.new; puts t.tokenize("I went to school").to_a'
There are two parts to this question which need answering and some clarification we should make to our documentation (I made an attempt already in https://github.com/jruby/jruby/wiki/CallingJavaFromJRuby):
require 'java'. It loads the ability to load java classes and treat them as if they were Ruby objects/classes. However, Since JRuby 1.7.x, JRuby internally needs to require 'java' so it has already required 'java' by the time your expression is evaluated. So technically it is true that "require 'java'" loads Java interoperability, but since our kernel does this now it is largely a no-op by the time you call it (see return value of the require). We still recommend putting it at the top of any file where you use Java interop. just so it is documented in your code. Also, the fact that it happens to be loaded is more of an impl detail and not a semantic detail (e.g. in the distant future we maybe won't require it in our kernel).
Unclear verbiage: "However, this will not give you access to any non-bundled libraries.". So if you want to access a library not in your CLASSPATH (this was stipulated in the parenthesis) you need to add them to your LOAD_PATH (or via direct require'ing). I tweaked that sentence to hopefully make it more clear.

How to compile Java files that put .class files directly into JAR

First some reference:
1st Link
2nd link
The first article 1st Link mentions about compiling the Java files directly into JAR files and avoiding one step in the build process. Does anyone know this?
-Vadiraj
As you linked to my blog post I thought it was only fair to give you an update.
Compiling directly to a Jar is actually fairly simple to do. Basically you extend
javax.tools.ForwardingJavaFileObject
Then override openOutputStream method and direct it to your Jar. As the Java Compiler is highly concurrent but writing to a jar file is highly sequential I'd recommend that you buffer to byte arrays and then have a background thread that writes the byte arrays as they arrive.
I do exactly this is my experimental build tool JCompilo https://code.google.com/p/jcompilo/
This tool is completely undocumented but crazy fast. Currently it's about 20-80% faster than any other Java build tool and about 10x faster than the Scala compiler for the same code.
As the author is talking about extending the compiler itself, it is possible that he has knowledge of the built-in capabilities of the compiler (that is what the compiler is capable of, maybe with a little encouragement by tweaking the code).
Right now I’m investigating extending the Java 6 compiler to remove the unneeded file exists checks and possible jaring the class files directly in the compiler. [emphasis mine]
That capability, however, is certainly not supported officially (no documentation exist about it on the javac webpage).
At best, the feature is compiler dependent; possibly requiring modification of the compiler's source code.

Programming an Interpreter for a Compiler

I'm writing an interpreter for a compiler program in Java. So after checking the source code, syntax and semantics, I want to be able to run the source code, which is the input for my compiler. I'm just wondering if I can just translate some tokens, for example, out (it prints stuff on screen), can I just replace it with System.out.print? then feed the source code again to run it in java?
I've heard of using the Java Compiler API, would this be a good plan?
Thank you very much in advance!
What you asking is a virtual machine implementation technique, to run your Java code in general you should implement following:
The first few steps I guess you already done (Design/describe the language semantics, construct AST and perform required validation of the code)
You need to generate your byte code, original Java works exactly in the same way, it generates another representation of the source code, from human readable to machine readable.
Here you can see how Java byte code looks like http://www.ibm.com/developerworks/ibm/library/it-haggar_bytecode/
You need to implement virtual aka stack machine that reads byte code and runs it for execution.
So as you can see you should have 3 separated components (projects) for your task:
1. Language grammar
2. Compiler (byte code generator)
3. Virtual machine (interpreter of byte code)
P.S. I have experience in creation of tiny Java similar compiler from scratch (define grammar with ANTlr, implementation of compiler, implementation of virtual machine), so probably I can share more information with you (even source code) if you need something particular
You really need to read some books and/or take courses on compilers - this can't be solved by a two-paragraph answer on SO.
You could create a cross-compiler which reads your language and outputs Java code to do the same thing. This may be the simplest option.
The Java Compiler API can be used to compile Java code. You would need to translate your existing code to Java first to use it.
This would not be the same thing as writing an interpreter. Is this homework? Does the task say you have to write the interpreter or can you have the code run any way which works?
Unfortunately you did not mention which scripting language are you planning to support. If it is one of well known languages, just use its ready interpreter written in pure java. See BSF and Java 5 scripting (http://www.ibm.com/developerworks/java/library/j-javascripting1/)
It it is your own language
think twice: do you really need it?
If you are sure you need your own language think about JavaCC
First of all, thank you very much for the fast replies.
As part of our compiler project, we need to be able to compile and run a program written in our own specified language. The language is very similar to C. I am confused on how an interpreter works, is there a simpler way to implement this? Without generating byte codes? My idea was to translate each statement into Java equivalent statements, and let Java handle the byte code generation.
I would look into the topics mentioned. Again, thank you very much for the suggestions.

Categories

Resources