How to dynamically decompile a Class Object on memory? - java

I'm making a tool to dynamically display the sourcecode of running java class. I need a tool to help me on dynamically decompile from a Class Object to String of sourcecode. I know some decompile tools like Jad, DJ decompiler can decompile a .class file but I expect a tool can:
Class<?> c = ..; // get from runtime environment
String sourcecode = **DecompileTool**.decompileClassObject(c);
return sourcecode;
I need such a DecompileTool, anyone knows? Thanks

I'm not aware of any decompiler that can be used like that.
Indeed, in the general case it is not possible to implement a decompiler that works like that:
The Class<?> object that you get from the runtime doesn't provide any way to get to the bytecodes.
In order to get hold of the bytecodes, you would need to redo what the classloader does when it locates the ".class" file from the classpath.
I don't think there's a way to find out what classloaders are in use ... if you include the possibility of dynamically instantiated classloaders. (And such classloaders are normal practice in (for example) web containers.)
In the general case, a classloader will do that in ways that you cannot reproduce ... without reverse engineering and hard-coding the same logic into your decompiler adapter code.
Besides, doing this on the fly is probably pointless, because there is a significant chance that the decompiler will produce source code that isn't compilable.

I don't think that any of these decompilers support this type of ugly interface.
First of all, most decompilers will represent any code in a similar format to the actual compiler, so, an Abstract Syntax Tree. If you are lucky, and the decompiler does have an interface, it will probably be of this type. Handing back a raw String is unlikely to be satisfactory, because how would the person writing the decompiler have any idea as to how you wanted the code formatted (one of the biggest challenges in decompilation is presenting the result to the user!).
Instead, what you should do, is write a little wrapper, that does this properly: on the fly generation of the files that need to be decompiled, sending them through the decompiler, and extracting the result into a String (which you can get immediately if you do clever forking and piping, etc..., but in reality you probably just want to format the output file of the decompiler..)

Try Cavaj Java Decomplier, it may be useful for you.If you aren't satisfied this, try JadClipse with eclipse IDE.

You can do the followin thing steps
1) You can use decompilers available to decompile the code like
Process p = Runtime.getRuntime().exec("jad /location/CompilesClass.class");
BufferedReader in = new BufferedReader(
new InputStreamReader(p.getInputStream()));
String line = null;
while ((line = in.readLine()) != null) {
System.out.println(line);
}
2) Now the Compiled class is converted into .jad extension (Source Code) in the path where the class file was there
3) Now you can read that .jad file by Scanner class of JDK6. Like
Scanner scanner = new Scanner(new File("locationOfJADFIle")).useDelimiter("\n");
while(scanner.hasNext()){
System.out.println(scanner.next());
}

Related

Eclipse file replacement before compile (like a pre-processor)?

So, given that Java has little to no support to unsigned types, I'm right now writing a small API to handle these (for now, I have UnsignedByte and UnsignedInt). The algorithm is simple: store each of them as their higher representation (byte->short, int->long), extends the Number class and implement some calculation and representation utility methods.
The problem is: it is actually very verbose - and boring - to have to, every time, code things like:
UnsignedByte value = new UnsignedByte(15);
UnsignedByte convert = new UnsignedByte(someIntValue);
I was wondering: is there any way to implement, on Eclipse, something like a "file pre-processor", in a way that it will automatically replace some pre-defined strings with other pre-defined strings before compiling the files?
For example: replace U(x) with new UnsignedByte(x), so it would be possible to use:
UnsignedByte value = U(15);
UnsignedByte convert = U(someIntValue);
Yes, I could create a method called U(...) and use import static, but even then, it would be so much trouble doing it for every class that I would use my unsigned types.
I could write a simple Java program that would replace these expressions in a file, but the problem is: How could I integrate that on Eclipse, in a way that it would call/use it every time a Java file is compiled?
I would recommend using Eclipse Templates for doing this instead. I know its not exactly what you ask for but its very simple and can be achieved out of the box.
When you write sysout in Eclipse and press Ctrl+Space it gives you an option to replace that with System.out.println();
You can find more information in the following link
How to add shortcut keys for java code in eclipse
I can point you at how one project I know of does this, they have a set of Python scripts that generate a whole set of classes (java files) from a template base file. They run the script manually, as opposed to part of the build.
Have a look here for the specific example. In this code they have a class for operating on double, but from this class they want to generate code to operate on float, int, etc all in the same way.
There is, of course, a big debate about whether generated code should be checked in or not to source repository. I leave that issue aside and hope that the above example is good to get you going.

How to get the code of a method from its Class

I am working with this awesome little piece of code, and I need some help figuring it out.
Basically it has a big byte array, which is used to load a new class:
Class class = class1.loadClass(name, byteArray);
class.getMethod("run", new Class[0]).invoke(null, new Object[0]);
This is loadClass:
public Class<?> loadClass(String paramString, byte[] paramArrayOfByte) throws ClassNotFoundException
return defineClass(paramString, paramArrayOfByte, 0, paramArrayOfByte.length);
And then it calls the 'run' in this class. Now, besides this being an awesome way to hide your classes, how do I get a look into this class? I have the actual class in 'class', but I don't get further than the attributes and their values and the function names. I want to know what 'run' actually does. Is there any way to somehow print the content of this method or whatever? Or maybe I can transform the byte array containing the class in something readable?
Thanks!
What if you write this byteArray in a .class file and open it with a Java Decompiler tool (like jd-gui) ?
You are asking how to read a class file into something meaninful.So basically you are asking for it to be decompiled like JD-Decompiler .Also,the .class files are loaded in the memory by the classloader.
I would suggest you take a dump of the .class file and write it in the file using ByteArrayOutputStream and ObjectOutputStream .Then decompile it using any decompiler software.
Please note that this process does not guarantee any accuracy of the source file decompiled..

parsing incomplete Java source code

In certain problem I need to parse a Java source code fragment that is potentially incomplete. For example, the code can refer to variables that are not defined in such fragment.
In that case, I would still like to parse such incomplete Java code, transform it to a convenient inspectable representation, and being able to generate source code from such abstract representation.
What is the right tool for this ? In this post I found suggestions to use Antlr, JavaCC or the Eclipse JDT.
However, I did not find any reference regarding dealing with incomplete Java source code fragments, hence this question (and in addition the linked question is more than two years old, so I am wondering if something new is on the map).
As an example, the code could be something like the following expression:
"myMethod(aVarName)"
In that case, I would like to be able to somehow detect that the variable aVarName is referenced in the code.
Uhm... This question does not have anything even vaguely like a simple answer. Any of the above parser technologies will allow you to do what you wish to do, if you write the correct grammar and manipulate the parser to do fallback parsing unknown token passover sort of things.
The least amount of work to get you where you're going is either to use ANTLR which has resumable parsing and comes with a reasonably complete java 7 grammar, or see what you can pull out of the eclipse JDT ( which is used for doing the error and intention notations and syntax highlighting in the eclipse IDE. )
Note that none of this stuff is easy -- you're writing klocs, not just importing a class and telling it to go.
At a certain point of incorrect/incompleteness all of these strategies will fail just because no computer ( or even person for that matter ) is able to discern what you mean unless you at least vaguely say it correctly.
Eclipse contains just that: a compiler that can cope with incomplete java code (basically, that was one reason for these guys to implement an own java-compiler. (See here for better explanation)
There are several tutorials that explain the ASTParser, here is one.
If you just want basic parsing - an undecorated AST - you can use existing Java parsers. But from your question I understand you're interested in deeper inspection of the partial code. First, be aware the problem you are trying to solve is far from simple, especially because partial code introduces a lot of ambiguities.
But there is an existing solution - I needed to solve a similar problem, and found that a nice fellow called Barthélémy Dagenais has worked on it, producing a paper and a pair of open-source tools - one based on Soot and the other (which is generally preferable) on Eclipse. I have used both and they work, though they have their own limitations - don't expect miracles.
Here's a direct link to a quick tutorial on how to start with the Eclipse-based tool.
I needed to solve a similar problem in my recent work. I have tried many tools, including Eclipse JDT ASTParser, python javalang and PPA. I'd like to share my experience. To sum up, they all can parse code fragment to some extent, but all failed to parse occasionally when the code fragment is too ambiguous.
Eclipse JDT ASTParser
Eclipse JDT ASTParser is the most powerful and widely-used tool. This is a code snippet to parse the method invocation node.
ASTParser parser = ASTParser.newParser(AST.JLS8);
parser.setResolveBindings(true);
parser.setKind(ASTParser.K_STATEMENTS);
parser.setBindingsRecovery(true);
Map options = JavaCore.getOptions();
parser.setCompilerOptions(options);
parser.setUnitName("test");
String src = "System.out.println(\"test\");";
String[] sources = { };
String[] classpath = {"C:/Users/chenzhi/AppData/Local/Programs/Java/jdk1.8.0_131"};
parser.setEnvironment(classpath, sources, new String[] { }, true);
parser.setSource(src.toCharArray());
final Block block = (Block) parser.createAST(null);
block.accept(new ASTVisitor() {
public boolean visit(MethodInvocation node) {
System.out.println(node);
return false;
}
});
You should pay attention to parser.setKind(ASTParser.K_STATEMENTS), this is setting the kind of constructs to be parsed from the source. ASTParser defines four kind (K_COMPILATION_UNIT, K_CLASS_BODY_DECLARATIONS, K_EXPRESSION, K_STATEMENTS), you can see this javadoc to understand the difference between them.
javalang
javalang is a simple python library. This is a code snippet to parse the method invocation node.
src = 'System.out.println("test");'
tokens = javalang.tokenizer.tokenize(code2)
parser = javalang.parser.Parser(tokens)
try:
ast = parser.parse_expression()
if type(ast) is javalang.tree.MethodInvocation:
print(ast)
except javalang.parser.JavaSyntaxError as err:
print("wrong syntax", err)
Pay attention to ast = parser.parse_expression(), just like the parser.setKind() function in Eclipse JDT ASTParser, you should set the proper parsing function or you will get the 'javalang.parser.JavaSyntaxError' exception. You can read the source code to figure out which function you should use.
PPA
Partial Program Analysis for Java (PPA) is a static analysis framework that transforms the source code of an incomplete Java program into a typed Abstract Syntax Tree. As #Oak said, this tool came from academy.
PPA comes as a set of Eclipse plug-ins which means it need to run with Eclipse. It has provided a headless way to run without displaying the Eclipse GUI or requiring any user input, but it is too heavy.
String src = "System.out.println(\"test\");";
ASTNode node = PPAUtil.getSnippet(src, new PPAOptions(), false);
// Walk through the compilation unit.
node.accept(new ASTVisitor() {
public boolean visit(MethodInvocation node) {
System.out.println(node);
return false;
}
});

Java source refactoring of 7000 references

I need to change the signature of a method used all over the codebase.
Specifically, the method void log(String) will take two additional arguments (Class c, String methodName), which need to be provided by the caller, depending on the method where it is called. I can't simply pass null or similar.
To give an idea of the scope, Eclipse found 7000 references to that method, so if I change it the whole project will go down. It will take weeks for me to fix it manually.
As far as I can tell Eclipse's refactoring plugin of Eclipse is not up to the task, but I really want to automate it.
So, how can I get the job done?
Great, I can copy a previous answer of mine and I just need to edit a tiny little bit:
I think what you need to do is use a source code parser like javaparser to do this.
For every java source file, parse it to a CompilationUnit, create a Visitor, probably using ModifierVisitor as base class, and override (at least) visit(MethodCallExpr, arg). Then write the changed CompilationUnit to a new File and do a diff afterwards.
I would advise against changing the original source file, but creating a shadow file tree may me a good idea (e.g. old file: src/main/java/com/mycompany/MyClass.java, new file src/main/refactored/com/mycompany/MyClass.java, that way you can diff the entire directories).
Eclipse is able to do that using Refactor -> Change Method signature and provide default values for the new parameters.
For the class parameter the defaultValue should be this.getClass() but you are right in your comment I don't know how to do for the method name parameter.
IntelliJ IDEA shouldn't have any trouble with this.
I'm not a Java expert, but something like this could work. It's not a perfect solution (it may even be a very bad solution), but it could get you started:
Change the method signature with IntelliJ's refactoring tools, and specify default values for the 2 new parameters:
c: self.getClass()
methodName: Thread.currentThread().getStackTrace()[1].getMethodName()
or better yet, simply specify null as the default values.
I think that there are several steps to dealing with this, as it is not just a technical issue but a 'situation':
Decline to do it in short order due to the risk.
Point out the issues caused by not using standard frameworks but reinventing the wheel (as Paul says).
Insist on using Log4j or equivalent if making the change.
Use Eclipse refactoring in sensible chunks to make the changes and deal with the varying defaults.
I have used Eclipse refactoring on quite large changes for fixing old smelly code - nowadays it is fairly robust.
Maybe I'm being naive, but why can't you just overload the method name?
void thing(paramA) {
thing(paramA, THE_DEFAULT_B, THE_DEFAULT_C)
}
void thing(paramA, paramB, paramC) {
// new method
}
Do you really need to change the calling code and the method signature? What I'm getting at is it looks like the added parameters are meant to give you the calling class and method to add to your log data. If the only requirement is just adding the calling class/method to the log data then Thread.currentThread().getStackTrace() should work. Once you have the StackTraceElement[] you can get the class name and method name for the caller.
If the lines you need replaced fall into a small number of categories, then what you need is Perl:
find -name '*.java' | xargs perl -pi -e 's/log\(([^,)]*?)\)/log(\1, "foo", "bar")/g'
I'm guessing that it wouldn't be too hard to hack together a script which would put the classname (derived from the filename) in as the second argument. Getting the method name in as the third argument is left as an exercise to the reader.
Try refactor using intellij. It has a feature called SSR (Structural Search and Replace). You can refer classes, method names, etc for a context. (seanizer's answer is more promising, I upvoted it)
I agree with Seanizer's answer that you want a tool that can parse Java. That's necessary but not sufficient; what you really want is a tool that can carry out a reliable mass-change.
To do this, you want a tool that can parse Java, can pattern match against the parsed code, install the replacement call, and spit out the answer without destroying the rest of the source code.
Our DMS Software Reengineering Toolkit can do all of this for a variety of languages, including Java. It parses complete java systems of source, builds abstract syntax trees (for the entire set of code).
DMS can apply pattern-directed, source-to-source transformations to achieve the desired change.
To achieve the OP's effect, he would apply the following program transformation:
rule replace_legacy_log(s:STRING): expression -> expression
" log(\s) " -> " log( \s, \class\(\), \method\(\) ) "
What this rule says is, find a call to log which has a single string argument, and replace it with a call to log with two more arguments determined by auxiliary functions class and method.
These functions determine the containing method name and containing class name for the AST node root where the rule finds a match.
The rule is written in "source form", but actually matches against the AST and replaces found ASTs with the modified AST.
To get back the modified source, you ask DMS to simply prettyprint (to make a nice layout) or fidelity print (if you want the layout of the old code preserved). DMS preserves comments, number radixes, etc.\
If the exisitng application has more than one defintion of the "log" function, you'll need to add a qualifier:
... if IsDesiredLog().
where IsDesiredLog uses DMS's symbol table and inheritance information to determine if the specific log refers to the definition of interest.
Il fact your problem is not to use a click'n'play engine that will allow you to replace all occurences of
log("some weird message");
by
log(this.getClass(), new Exception().getStackTrace()[1].getMethodName());
As it has few chances to work on various cases (like static methods, as an example).
I would tend to suggest you to take a look at spoon. This tool allows source code parsing and transformation, allowing you to achieve your operation in a -obviously code based- slow, but controlled operation.
However, you could alos consider transforming your actual method with one exploring stack trace to get information or, even better, internally use log4j and a log formatter that displays the correct information.
I would search and replace log( with log(#class, #methodname,
Then write a little script in any language (even java) to find the class name and the method names and to replace the #class and #method tokens...
Good luck
If the class and method name are required for "where did this log come from?" type data, then another option is to print out a stack trace in your log method. E.g.
public void log(String text)
{
StringWriter sw = new StringWriter();
PrintWriter pw = new PrintWriter(sw, true);
new Throwable.printStackTrace(pw);
pw.flush();
sw.flush();
String stackTraceAsLog = sw.toString();
//do something with text and stackTraceAsLog
}

Decompile JavaEE

We have a Java EE app which vendor does not exist anymore (due to bankruptcy). Unfortunately we have to make some changes to the functionality of the app, and this means reverse engineering the JavaEE app.
We use JD-GUI to reverse-engineer about 70% of the app/classes, and then tweak them manually to build in Eclipse.
However the rests are not so easy to be built because they are produced by code-generators? What tools can I use to assist further?
Edit:
This is one example of the difficulties:
return ((SchemaTypeSystem)Class.forName(
"org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl",
true,
class$schema$system$s322D2AAD7A06BA82525CDB874D86D59A$TypeSystemHolder.getClassLoader())
.getConstructor(new Class[] { Class.class })
.newInstance(new Object[] { TypeSystemHolder.class }));
It's hard to know what is
class$schema$system$s322D2AAD7A06BA82525CDB874D86D59A$TypeSystemHolder.getClassLoader())
Give JAD (http://www.varaneckas.com/jad) a try.
The problematic code that you show is equivalent to the following:
1) Class class$schema$system$s322D2AAD7A06BA82525CDB874D86D59A$TypeSystemHolder;
2) ClassLoader loader = class$schema$system$s322D2AAD7A06BA82525CDB874D86D59A$TypeSystemHolder.getClassLoader();
3) Class type = Class.forName("org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl", true, loader);
4) Constructor ctor = type.getConstructor(Class.class);
5) Object obj = ctor.newInstance(TypeSystemHolder.class);
6) SchemaTypeSystem result = (SchemaTypeSystem) obj;
7) return result;
The part you are having trouble with is line 1, which represents a local variable or a field (possibly static). The Java compiler converts the expression 'TypeSystemHolder.class' into an invocation of getClass storing the result in a static field. This initialization happens once in each class that references 'TypeSystemHolder.class' and the compiler replaces each callsite that uses this expression with a field access.
Most decompilers fail to translate this idiom back to the original call to 'TypeSystemHolder.class' but JAD handles this quite well. Additionally, there is a plug-in that integrates JAD (and others) into Eclipse (http://jadclipse.sourceforge.net).
Unfortunately, decompilers do not handle every code sequence generated by a compiler so some manual rewriting is always required. For example, the Java compiler may generate code for one exception handling block that overlaps with code for another exception handling block. Decompilers are unable to separate this back into two catch blocks. In this case, one usually sees goto statements littered throughout the code (not valid Java) or the decompiler simply gives up on that method.
Also, you are correct that this is generated code. Specifically, it is from the XmlBeans compiler, which parses xn XML Schema and generates binding classes for Java; allowing one to serailize and deserialize XML documents conforming to that schema. If you have access to the schema it would be better to incorporate XmlBeans into your build instead of decompiling these classes.
Take a look at soot. It doesn't decompile to Java source code, but uses an intermediate layer that is compilable. While its yet another language to learn, you will get the flexibility you need.
Additionally, if you are only making small tweaks, you can just attack files individually and leave the rest intact.

Categories

Resources