We have a Java EE app which vendor does not exist anymore (due to bankruptcy). Unfortunately we have to make some changes to the functionality of the app, and this means reverse engineering the JavaEE app.
We use JD-GUI to reverse-engineer about 70% of the app/classes, and then tweak them manually to build in Eclipse.
However the rests are not so easy to be built because they are produced by code-generators? What tools can I use to assist further?
Edit:
This is one example of the difficulties:
return ((SchemaTypeSystem)Class.forName(
"org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl",
true,
class$schema$system$s322D2AAD7A06BA82525CDB874D86D59A$TypeSystemHolder.getClassLoader())
.getConstructor(new Class[] { Class.class })
.newInstance(new Object[] { TypeSystemHolder.class }));
It's hard to know what is
class$schema$system$s322D2AAD7A06BA82525CDB874D86D59A$TypeSystemHolder.getClassLoader())
Give JAD (http://www.varaneckas.com/jad) a try.
The problematic code that you show is equivalent to the following:
1) Class class$schema$system$s322D2AAD7A06BA82525CDB874D86D59A$TypeSystemHolder;
2) ClassLoader loader = class$schema$system$s322D2AAD7A06BA82525CDB874D86D59A$TypeSystemHolder.getClassLoader();
3) Class type = Class.forName("org.apache.xmlbeans.impl.schema.SchemaTypeSystemImpl", true, loader);
4) Constructor ctor = type.getConstructor(Class.class);
5) Object obj = ctor.newInstance(TypeSystemHolder.class);
6) SchemaTypeSystem result = (SchemaTypeSystem) obj;
7) return result;
The part you are having trouble with is line 1, which represents a local variable or a field (possibly static). The Java compiler converts the expression 'TypeSystemHolder.class' into an invocation of getClass storing the result in a static field. This initialization happens once in each class that references 'TypeSystemHolder.class' and the compiler replaces each callsite that uses this expression with a field access.
Most decompilers fail to translate this idiom back to the original call to 'TypeSystemHolder.class' but JAD handles this quite well. Additionally, there is a plug-in that integrates JAD (and others) into Eclipse (http://jadclipse.sourceforge.net).
Unfortunately, decompilers do not handle every code sequence generated by a compiler so some manual rewriting is always required. For example, the Java compiler may generate code for one exception handling block that overlaps with code for another exception handling block. Decompilers are unable to separate this back into two catch blocks. In this case, one usually sees goto statements littered throughout the code (not valid Java) or the decompiler simply gives up on that method.
Also, you are correct that this is generated code. Specifically, it is from the XmlBeans compiler, which parses xn XML Schema and generates binding classes for Java; allowing one to serailize and deserialize XML documents conforming to that schema. If you have access to the schema it would be better to incorporate XmlBeans into your build instead of decompiling these classes.
Take a look at soot. It doesn't decompile to Java source code, but uses an intermediate layer that is compilable. While its yet another language to learn, you will get the flexibility you need.
Additionally, if you are only making small tweaks, you can just attack files individually and leave the rest intact.
Related
In the following expression:
T(org.apache.commons.io.IOUtils).toString(T(java.lang.Runtime)
.getRuntime().exec(T(java.lang.Character).toString(105)
.concat(T(java.lang.Character).toString(100))).getInputStream())
Does the '105' in toString(105) refer to an itemized object within the Character class?
and
Why is the 'T', which I believe expresses a generic type, and is used 4 times in this expression, a necessary feature of Java?
The toString() method that seems to be invoked here is actually the toString(char) (static) method of java.lang.Character. Quoting the documentation:
public static String toString(char c)
Returns a String object representing the specified char.
The result is a string of length 1 consisting solely of the specified char.
Parameters:
c - the char to be converted
Returns:
the string representation of the specified char
Since:
1.4
Note that 100 and 105 are also valid char values where 100 == 'd' and 105 == 'i'.
Update: after knowing the context, I am now confident that this code is intended to be injected into a template for a web page. The template engine used provides special syntax for accessing static methods where T(Classname) resolves to just Classname (not Classname.class!) in the resulting Java code.
So your code would be translated to:
org.apache.commons.io.IOUtils.toString(java.lang.Runtime
.getRuntime().exec(java.lang.Character.toString(105)
.concat(java.lang.Character.toString(100))).getInputStream())
The full qualification of the class names is necessary because we do not know if those classes are imported on the attacked site (or if the template engine even allows imports or class names must always be fully qualified).
A more readable version of the code that assumes imports is
IOUtils.toString(
Runtime.getRuntime().exec(
Character.toString(105).concat(Character.toString(100))
).getInputStream()
)
And after a little de-obfuscation...
IOUtils.toString(Runtime.getRuntime().exec("id").getInputStream())
Whatever this is, it is definitely NOT meaningful Java code.
And the fact that you can provide it as as a search query on some site is not evidence that it is Java either.
I suspect that this is actually some custom (site-specific?) query language. That makes it futile to try to understand it as a Java snippet.
Your theory that T could denote a generic type parameter doesn't work. Java would not allow you to write T(...) if that was the case.
Furthermore, if we assume that org.apache.commons.io.IOUtils, java.lang.Runtime and so on are intended to refer to Java class objects, then the correct Java syntax would be org.apache.commons.io.IOUtils.class, java.lang.Runtime.class and so on.
So what does it mean?
Well a bit of Googling found me some other examples that look like yours. For instance;
https://github.com/VikasVarshney/ssti-payload
appears to generate "code" that is reminiscent of your example. This is SSTI - Server Side Template Injection, and it appears to be targeting Java EE Expression Language (EL).
And I think this particular example is an attempt to run the Linux id program ... which would output some basic information about the user and group ids for the account running your web server.
Does it matter? Well only if your site is vulnerable to SSTI attacks!
How would you know if your site is vulnerable?
By understanding the nature of SSTI with respect to EL and other potential attack vectors ... and auditing your codebase and configurations.
By using a vulnerability scanner to test your site and/or your code-base.
By employing the services of a trustworthy IT security company to do some penetration testing.
In this case, you could also try to use curl to repeat the attempted attack ... as the hacker would have done ... based on what is in your logs. Just see if it actually works. Note that running the id program does no actual damage to your system. The harm would be in the information that is leaked to a hacker ... if they succeed.
Note that if this hack did succeed, then the hacker would probably try to run other programs. These could do some damage to your system, depending on how how well your server was hardened against such things.
So, given that Java has little to no support to unsigned types, I'm right now writing a small API to handle these (for now, I have UnsignedByte and UnsignedInt). The algorithm is simple: store each of them as their higher representation (byte->short, int->long), extends the Number class and implement some calculation and representation utility methods.
The problem is: it is actually very verbose - and boring - to have to, every time, code things like:
UnsignedByte value = new UnsignedByte(15);
UnsignedByte convert = new UnsignedByte(someIntValue);
I was wondering: is there any way to implement, on Eclipse, something like a "file pre-processor", in a way that it will automatically replace some pre-defined strings with other pre-defined strings before compiling the files?
For example: replace U(x) with new UnsignedByte(x), so it would be possible to use:
UnsignedByte value = U(15);
UnsignedByte convert = U(someIntValue);
Yes, I could create a method called U(...) and use import static, but even then, it would be so much trouble doing it for every class that I would use my unsigned types.
I could write a simple Java program that would replace these expressions in a file, but the problem is: How could I integrate that on Eclipse, in a way that it would call/use it every time a Java file is compiled?
I would recommend using Eclipse Templates for doing this instead. I know its not exactly what you ask for but its very simple and can be achieved out of the box.
When you write sysout in Eclipse and press Ctrl+Space it gives you an option to replace that with System.out.println();
You can find more information in the following link
How to add shortcut keys for java code in eclipse
I can point you at how one project I know of does this, they have a set of Python scripts that generate a whole set of classes (java files) from a template base file. They run the script manually, as opposed to part of the build.
Have a look here for the specific example. In this code they have a class for operating on double, but from this class they want to generate code to operate on float, int, etc all in the same way.
There is, of course, a big debate about whether generated code should be checked in or not to source repository. I leave that issue aside and hope that the above example is good to get you going.
I'm new to compiler design and have few years with java.
Using this and the paper
It's look like after Class hierarchy analysis and rapid type analysis will get information to do de-virtualisation. But where to patch back the information on source code or on Byte-code. And how to check the results?
Trying to understand how things really happens but stuck here.
For example : We have an example program taken from paper specified above.
public class MyProgram {
public static void main(String[] args) {
EUCitizen citizen = getCitizen();
citizen.hasRightToVote(); // Call site 1
Estonian estonian = getEstonian();
estonian.hasRightToVote(); // Call site 2
}
private static EUCitizen getCitizen() {
return new Estonian();
}
private static Estonian getEstonian() {
return new Estonian();
}
}
Using Class hieracrchy method we can conclude as none of the subclasses override hasRightToVote() , the dynamic method invocation can be replaced with a static procedure call to Estonian#hasRightToVote() . But where to replace this information and How? How to tell JVM (feed JVM) that information that we have gathered during analysis.
You can't change source code and put this there ? Could anyone provide me an example so i can start trying new ways to do analysis and still be able to patch that information.
Thanks.
Class Hierarchy Analysis is an optimization done by the virtual machine itself at runtime, you do not have to tell the VM anything. It simply does the analysis by itself based on the information available in the class files.
What generally happens is that analysis results are typically stored as some kind of association with a program representation, or are used immediately to effect the optimization so "nothing" needs to be stored.
You are right: there is generally no "good" way to annotate the source code with an analysis result (you can use Java annotations as a way). But the compiler has already read the source code and isn't going read it again.
In general, the program is parsed and variety of compiler-like structures are built (ASTs, symbol tables, control flow graphs, data flow arcs, ...) by the compiler pretty much before any serious analysis/optimization begins. A low level model of the program (data flow over the operators) is normally what gets analyzed, and the optimization analyzer will either decorate this structure with its opinions, or often just directly modify this structure to achieve the effect of the optimization.
With Java, there are two opportunities to do this: in JavaC, and in the JITter. My understanding (probably wrong, probably varies across JavaC implementations) is that not much optimization occurs in JavaC at all; it just generates naive JVM bytecode, and that all the real work is done in the JITter. The JITter doesn't have source code, but it can do all the same kinds of analysis (control flow, dataflow, ...) on the byte code that one can do on classic compiler structures, and thus achieve the same effect.
I had some doubts with the same and Rohan Padhey Cleared the ones.
In Java, I don't think there is a way to specify monomophrism of virtual method calls in byte-code. The de-virtualization analysis usually happens in the JIT compiler which compiles bytecode to native code and it does so using dynamic analysis.
Why Patching is a Problem :
In Java bytecode, the only method call instructions are: invokestatic, invokedynamic, invokevirtual, invokeinterface and invokespecial (the last is used for constructors, etc). The only type of call that does not refer to virtual method table lookups is the invokestatic call, since static methods cannot be overridden and used polymorphically on objects.
Hence, while there is no way to do a compile-time specification of the target method, you can replace virtual calls with static calls. How? consider an object "x" with a method "foo", and a call-site:
x.foo(arg1, arg2, ...)
If you know for sure that "x" is of the class "A", then you can transform this to:
A.static_foo(x, arg1, arg2, ...)
where "static_foo" is a newly created static method in class A whose body contains exactly everything that the body of "foo()" in "A" would have done, except that references to "this" inside the body should now be replaced by the first parameter, whatever you may call it.
That is exactly what the Whole-Jimple-Optimization-Pack (WJOP) in Soot does.
As regards static analysis using Soot, there is an optimization pack that does devirtualization using a work-around: https://github.com/Sable/soot/wiki/Whole-program-Devirtualization-Optimizations
But That's just a hack.
Why JIT Times Its Better :
JIT doing this better is due to the fact that static analysis has to be sound because you need to be sure when doing this transformation that 100% of the time the target of the virtual call will be one class. With JIT compilation, you can find more opportunities for optimization because even if the target is a single class 90% of the time, but not 10%, you can just-in-time compile the code to use the most-frequently taken route, and fall-back to using bytecode in the 10% of the cases where this prediction was wrong, because you can check this mistake dynamically. While the fall-back is expensive, the common-case of correct predictions 90% of the time leads to overall benefit. With static transformation, you have to make a decision of whether or not to optimize and it better be sound.
I need to change the signature of a method used all over the codebase.
Specifically, the method void log(String) will take two additional arguments (Class c, String methodName), which need to be provided by the caller, depending on the method where it is called. I can't simply pass null or similar.
To give an idea of the scope, Eclipse found 7000 references to that method, so if I change it the whole project will go down. It will take weeks for me to fix it manually.
As far as I can tell Eclipse's refactoring plugin of Eclipse is not up to the task, but I really want to automate it.
So, how can I get the job done?
Great, I can copy a previous answer of mine and I just need to edit a tiny little bit:
I think what you need to do is use a source code parser like javaparser to do this.
For every java source file, parse it to a CompilationUnit, create a Visitor, probably using ModifierVisitor as base class, and override (at least) visit(MethodCallExpr, arg). Then write the changed CompilationUnit to a new File and do a diff afterwards.
I would advise against changing the original source file, but creating a shadow file tree may me a good idea (e.g. old file: src/main/java/com/mycompany/MyClass.java, new file src/main/refactored/com/mycompany/MyClass.java, that way you can diff the entire directories).
Eclipse is able to do that using Refactor -> Change Method signature and provide default values for the new parameters.
For the class parameter the defaultValue should be this.getClass() but you are right in your comment I don't know how to do for the method name parameter.
IntelliJ IDEA shouldn't have any trouble with this.
I'm not a Java expert, but something like this could work. It's not a perfect solution (it may even be a very bad solution), but it could get you started:
Change the method signature with IntelliJ's refactoring tools, and specify default values for the 2 new parameters:
c: self.getClass()
methodName: Thread.currentThread().getStackTrace()[1].getMethodName()
or better yet, simply specify null as the default values.
I think that there are several steps to dealing with this, as it is not just a technical issue but a 'situation':
Decline to do it in short order due to the risk.
Point out the issues caused by not using standard frameworks but reinventing the wheel (as Paul says).
Insist on using Log4j or equivalent if making the change.
Use Eclipse refactoring in sensible chunks to make the changes and deal with the varying defaults.
I have used Eclipse refactoring on quite large changes for fixing old smelly code - nowadays it is fairly robust.
Maybe I'm being naive, but why can't you just overload the method name?
void thing(paramA) {
thing(paramA, THE_DEFAULT_B, THE_DEFAULT_C)
}
void thing(paramA, paramB, paramC) {
// new method
}
Do you really need to change the calling code and the method signature? What I'm getting at is it looks like the added parameters are meant to give you the calling class and method to add to your log data. If the only requirement is just adding the calling class/method to the log data then Thread.currentThread().getStackTrace() should work. Once you have the StackTraceElement[] you can get the class name and method name for the caller.
If the lines you need replaced fall into a small number of categories, then what you need is Perl:
find -name '*.java' | xargs perl -pi -e 's/log\(([^,)]*?)\)/log(\1, "foo", "bar")/g'
I'm guessing that it wouldn't be too hard to hack together a script which would put the classname (derived from the filename) in as the second argument. Getting the method name in as the third argument is left as an exercise to the reader.
Try refactor using intellij. It has a feature called SSR (Structural Search and Replace). You can refer classes, method names, etc for a context. (seanizer's answer is more promising, I upvoted it)
I agree with Seanizer's answer that you want a tool that can parse Java. That's necessary but not sufficient; what you really want is a tool that can carry out a reliable mass-change.
To do this, you want a tool that can parse Java, can pattern match against the parsed code, install the replacement call, and spit out the answer without destroying the rest of the source code.
Our DMS Software Reengineering Toolkit can do all of this for a variety of languages, including Java. It parses complete java systems of source, builds abstract syntax trees (for the entire set of code).
DMS can apply pattern-directed, source-to-source transformations to achieve the desired change.
To achieve the OP's effect, he would apply the following program transformation:
rule replace_legacy_log(s:STRING): expression -> expression
" log(\s) " -> " log( \s, \class\(\), \method\(\) ) "
What this rule says is, find a call to log which has a single string argument, and replace it with a call to log with two more arguments determined by auxiliary functions class and method.
These functions determine the containing method name and containing class name for the AST node root where the rule finds a match.
The rule is written in "source form", but actually matches against the AST and replaces found ASTs with the modified AST.
To get back the modified source, you ask DMS to simply prettyprint (to make a nice layout) or fidelity print (if you want the layout of the old code preserved). DMS preserves comments, number radixes, etc.\
If the exisitng application has more than one defintion of the "log" function, you'll need to add a qualifier:
... if IsDesiredLog().
where IsDesiredLog uses DMS's symbol table and inheritance information to determine if the specific log refers to the definition of interest.
Il fact your problem is not to use a click'n'play engine that will allow you to replace all occurences of
log("some weird message");
by
log(this.getClass(), new Exception().getStackTrace()[1].getMethodName());
As it has few chances to work on various cases (like static methods, as an example).
I would tend to suggest you to take a look at spoon. This tool allows source code parsing and transformation, allowing you to achieve your operation in a -obviously code based- slow, but controlled operation.
However, you could alos consider transforming your actual method with one exploring stack trace to get information or, even better, internally use log4j and a log formatter that displays the correct information.
I would search and replace log( with log(#class, #methodname,
Then write a little script in any language (even java) to find the class name and the method names and to replace the #class and #method tokens...
Good luck
If the class and method name are required for "where did this log come from?" type data, then another option is to print out a stack trace in your log method. E.g.
public void log(String text)
{
StringWriter sw = new StringWriter();
PrintWriter pw = new PrintWriter(sw, true);
new Throwable.printStackTrace(pw);
pw.flush();
sw.flush();
String stackTraceAsLog = sw.toString();
//do something with text and stackTraceAsLog
}
Suppose I want to add minor syntactic sugars to Java. Just little things like adding regex pattern literals, or perhaps base-2 literals, or multiline strings, etc. Nothing major grammatically (at least for now).
How would one go about doing this?
Do I need to extend the bytecode compiler? (Is that possible?)
Can I write Eclipse plugins to do simple source code transforms before feeding it to the standard Java compiler?
I would take a look at Project Lombok and try to reuse the attempt they take. They use Java 5 annotations to hook in a Java agent which can manipulate the abstract syntax tree before the code is compiled. They are currently working on creating an API to allow custom transformers to be written which can be used with javac, or the major IDEs such as Eclipse and NetBeans. As well as annotations which trigger code to be generated, they are also planning on adding syntax changes (possibly mixin or pre-Java 7 closure syntax).
(I may have some of the details slightly off, but I think I'm pretty close).
Lombok is open source so studying their code and trying to build on that would probably be a good start.
Failing that, you could attempt to change the javac compiler. Though from what I've heard that's likely to be a hair-pulling exercise in frustration for anyone who is not a compiler and Java expert.
You can hack javac with JSR 269 (pluggable annotation processing) notably. You can hook into the visitor that traverse the statements in the source code and transform it.
Here is for instance the core of a transformation to add support for roman number in java (read of course the complete post for more details). It seems relatively easy.
public class Transform extends TreeTranslator {
#Override
public void visitIdent(JCIdent tree) {
String name = tree.getName().toString();
if (isRoman(name)) {
result = make.Literal(numberize(name));
result.pos = tree.pos;
} else {
super.visitIdent(tree);
}
}
}
Here are additional resources:
Hacker's guide to the java compiler
Javac hacker resources
I don't know if project Lombok (cited in the other answer) uses the same technique, but I guess yes.
Charles Nutter, the tech lead of JRuby, extended Javac with literal regular expressions. He had to change about 3 lines of case, as far I recall.
See http://twitter.com/headius/status/1319031705
And here is an awesome tutorial on how to add a new operator to javac, http://www.ahristov.com/tutorial/java-compiler.html
For more links like that, see my list of Links for javac hackers .
Charles Nutter, the tech lead of JRuby, extended Javac with literal regular expressions. He had to change about 3 lines of case, as far I recall.
See http://twitter.com/headius/status/1319031705