Sometimes when decompiling Java code, the decompiler doesn't manage to decompile it properly and you end up with little bits of bytecode in the output.
What are the weaknesses of decompilers? Are there any examples of Java source code that compiles into difficult-to-decompile bytecode?
Update:
Note that I'm aware that exploiting this information is not a safe way to hide secrets in code, and that decompilers can be improved in the future.
Nonetheless I am still interested in finding out what kinds of code foxes todays crop of decompilers.
Any Java byte code that's been through an obfuscator will have "ridiculous" output from the decompiler. Also, when you have other languages like Scala that compile to JVM byte code, there's no rule that the byte code be easily represented back in Java, and likely isn't.
Over time, decompilers have to keep up with the new language features and the byte code they produce, so it's plausible that new language features are not easily reversed by the tools you're using.
Edit: As an example in .NET, the following code:
lock (this)
{
DoSomething();
}
compiles to this:
Monitor.Enter(this);
try
{
DoSomething();
}
catch
{
Monitor.Exit(this);
}
The decompiler has to know that C# (as opposed to any other .NET language) has a special syntax dedicated to exactly those two calls. Otherwise you get unexpected (verbose) results.
The JDBC type-4 drivers for DB2 Connect are classics. Everything called one or two-letter names, irrelevant code that ends up having no effect, and more. I once tried to take a look to debug a particularly annoying problem and basically gave up. I'm hoping (but by no means confident) that this was passed through an obfuscator rather than the code actually looking like that.
Another favorite trick (although I can't remember the product) was to rename all objects to be constructed from the set {'0','O','l','1'}, which made reading it very difficult.
Assuming you can decompile back to a reasonable style of source code (you can't always do that), what is hard to "reverse engineer" are algorithms that operate in unfamiliar problem domains. If you don't understand Fast Fourier transforms, it doesn't matter much if you can get back the code that implements an FFT Butterfly.
(If this phrase is unfamiliar to you, I've already won if I encode one. If it is familiar to you, you are a pretty good engineer and probably don't have any interest in reverse engineering code). [Your mileage with North Koreans may vary.]
Java keeps a lot of information in the bytecode (for instance many names). So it is relatively easy to decompile. Hard to decompile bytecode mostly is generated by hard to read sourcecode (so that's not really an option). If you really want to obfuscate your code, use a obfuscator, that renames all methods and variables to unrecognizable stuff.
Exceptions are often difficult to decompile.
However, any code which has been obfuscated or has been written in another language is difficult to decompile.
BTW: Why would you want to know this?
Java Bytecode does not correspond directly to Java constructions, so decompiling implies that you know that a certain java byte code sequence corresponds to a Java code construction.
The Soot framework for decompiling java byte code has a lot of information on this, but their webpage is down for me right now.
http://www.sable.mcgill.ca/soot/
Related
I have came across many Java obfuscators which just renames the class names and that can be viewed by online java decompilers. But I want a obfuscator which generates output that cannot be decompiled using any tools. ( We can obfuscate the .net projects in such way ).
Please suggest me such java obfuscator ?
Search for "flow obfuscation".
Decompilation is always possible. However, a lot of decompilers expect bytecode to be generated by regular compilers like javac and have trouble to restore compilable Java sources from flow-obfuscated classes. The results are more often than not so broken that it is hard for a human to recognize the original (Java high-level) control flow. That creates an additional hurdle that can only be overcome by investing more time in bytecode analysis.
Be aware, however, that it's a race: decompilers are also becoming better at this. So you should always test the obfuscated results against all decompilers you can get a hold off.
We've used an obfuscator called "allatori" (together with a second mostly name-based obfuscator) in a project with quite satisfactory results.
As Stefan mentioned flow obfuscation is an easy way to mess with older decompilers. For modern decompilers you're going to have to put in more effort. Heres some features you can search for:
Synthetic Modifiers (JD-GUI does not show synthetic members)
Opaque Predicates (Think of this as junk code that can't be resolved statically)
String Encryption
Not sure what it's called but replacing all invokes with invokedynamics is cruel. Requires Java 8 to run and decompile.
Everything is reversible but it'll make it more of a pain.
How can I package my Java application into an executable jar that cannot be decompiled (for example , by Jadclipse)?
You can't. If the JRE can run it, an application can de-compile it.
The best you can hope for is to make it very hard to read (replace all symbols with combinations of 'l' and '1' and 'O' and '0', put in lots of useless code and so on). You'd be surprised how unreadable you can make code, even with a relatively dumb translation tool.
This is called obfuscation and, while not perfect, it's sometimes adequate.
Remember, you can't stop the determined hacker any more than the determined burglar. What you're trying to do is make things very hard for the casual attacker. When presented with the symbols O001l1ll10O, O001llll10O, OO01l1ll10O, O0Ol11ll10O and O001l1ll1OO, and code that doesn't seem to do anything useful, most people will just give up.
First you can't avoid people reverse engineering your code. The JVM bytecode has to be plain to be executed and there are several programs to reverse engineer it (same applies to .NET CLR). You can only make it more and more difficult to raise the barrier (i.e. cost) to see and understand your code.
Usual way is to obfuscate the source with some tool. Classes, methods and fields are renamed throughout the codebase, even with invalid identifiers if you choose to, making the code next to impossible to comprehend. I had good results with JODE in the past. After obfuscating use a decompiler to see what your code looks like...
Next to obfuscation you can encrypt your class files (all but a small starter class) with some method and use a custom class loader to decrypt them. Unfortunately the class loader class can't be encrypted itself, so people might figure out the decryption algorithm by reading the decompiled code of your class loader. But the window to attack your code got smaller. Again this does not prevent people from seeing your code, just makes it harder for the casual attacker.
You could also try to convert the Java application to some windows EXE which would hide the clue that it's Java at all (to some degree) or really compile into machine code, depending on your need of JVM features. (I did not try this.)
GCJ is a free tool that can compile to either bytecode or native code. Keeping in mind, that does sort of defeat the purpose of Java.
A little late I know, but the answer is no.
Even if you write in C and compile to native code, there are dissasemblers / debuggers which will allow people to step through your code. Granted - debugging optimized code without symbolic information is a pain - but it can be done, I've had to do it on occasion.
There are steps that you can take to make this harder - e.g. on windows you can call the IsDebuggerPresent API in a loop to see if somebody is debugging your process, and if yes and it is a release build - terminate the process. Of course a sufficiently determined attacker could intercept your call to IsDebuggerPresent and always return false.
There are a whole variety of techniques that have cropped up - people who want to protect something and people who are out to crack it wide open, it is a veritable arms race! Once you go down this path - you will have to constantly keep updating/upgrading your defenses, there is no stopping.
This not my practical solution but , here i think good collection or resource and tutorials for making it happen to highest level of satisfaction.
A suggestion from this website (oracle community)
(clean way), Obfuscate your code, there are many open source and free
obfuscator tools, here is a simple list of them : [Open source
obfuscators list] .
These tools make your code unreadable( though still you can decompile
it) by changing names. this is the most common way to protect your
code.
2.(Not so clean way) If you have a specific target platform (like windows) or you can have different versions for different platforms,
you can write a sophisticated part of your algorithms in a low level
language like C (which is very hard to decompile and understand) and
use it as a native library in you java application. it is not clean,
because many of us use java for it's cross-platform abilities, and
this method fades that ability.
and this one below a step by step follow :
ProtectYourJavaCode
Enjoy!
Keep your solutions added we need this more.
If I wanted to create a new language for Java I should make a compiler that is able to generate the byte-code compatible with the JVM spec, right? and also for the JDK libraries?
Where can I find some info?
Thanks.
Depends what you mean by "create a new language for Java" -- do you mean a language that compiles to bytecode and the code it generates can be used from any Java app (e.g. Groovy) or an interpreted language (for which you want to write a parser in Java)?
If it is the former one then #Joachim is right, look at the JVM spec; for the latter look at the likes of JavaCC for creating a parser for your language grammar.
I would start with a compiler which produced Java source. You may find this easier to read/understand/debug. Later you can optimise it to produce byte code.
EDIT:
If you have features which cannot be easily translated to Java code, you should be able to create a small number of byte code classes using Jasmin with all the exotic functionality which you can test to death. From the generated Java code this will look like a plain method call. The JVM can still inline the method so this might not impact performance at all.
The Java Virtual Machine Spec should have most of what you need.
An excellent library for bytecode generation/manipulation is ASM: http://asm.ow2.org.
It is very versatile and flexible. Note however that it's API is based on events (similar to Sax parsers) - it reads .class files and invokes a method whenever it encounters a new entity (class declaration, method declaration, statements, etc.). This may seem a bit awkward at first, but it saves a lot of memory (compared to the alternative: the library read the input, spits out a fully-evolved tree structure and then you have to iterate over it).
I don't think this will help much practically, but it has a lot of sweet theoretical stuff that I think you'll find useful.
http://www.codeproject.com/KB/recipes/B32Machine1.aspx
I want to migrate my entire C# 4.0(.Net 2010) desktop Application to Java.I don't know any tool available for that?Please suggest me good one.
Also, i would like to know what are the limitations and advantages of Cross Compiler for C# to java?
please guide me to get out of this problem...
Saravanan.P
Crosscompilers will usually produce rather messy code, and sometimes code that doesn't even compile.
Some (maybe most) will force your new code into having bindings with custom libraries from the crosscompiler, and thus be forever linked to that product.
Your new code will be very hard to maintain and expand as a result, and might well offer poor performance as well as compared to the old code when compiled.
In general, you would most likely be better off rewriting the application yourself (or hiring people to do so) if it is going to have to be used and maintained actively for more than a short, transitional period.
That said, for some things a crosscompiler can be helpful. For example start with a crosscompiled version and over time replace that codebase with newly written code, this would get you working more quickly and you'd not have to maintain 2 separate code bases, in 2 different languages, using 2 toolsets, at the same time.
I'm writing an interpreter for a compiler program in Java. So after checking the source code, syntax and semantics, I want to be able to run the source code, which is the input for my compiler. I'm just wondering if I can just translate some tokens, for example, out (it prints stuff on screen), can I just replace it with System.out.print? then feed the source code again to run it in java?
I've heard of using the Java Compiler API, would this be a good plan?
Thank you very much in advance!
What you asking is a virtual machine implementation technique, to run your Java code in general you should implement following:
The first few steps I guess you already done (Design/describe the language semantics, construct AST and perform required validation of the code)
You need to generate your byte code, original Java works exactly in the same way, it generates another representation of the source code, from human readable to machine readable.
Here you can see how Java byte code looks like http://www.ibm.com/developerworks/ibm/library/it-haggar_bytecode/
You need to implement virtual aka stack machine that reads byte code and runs it for execution.
So as you can see you should have 3 separated components (projects) for your task:
1. Language grammar
2. Compiler (byte code generator)
3. Virtual machine (interpreter of byte code)
P.S. I have experience in creation of tiny Java similar compiler from scratch (define grammar with ANTlr, implementation of compiler, implementation of virtual machine), so probably I can share more information with you (even source code) if you need something particular
You really need to read some books and/or take courses on compilers - this can't be solved by a two-paragraph answer on SO.
You could create a cross-compiler which reads your language and outputs Java code to do the same thing. This may be the simplest option.
The Java Compiler API can be used to compile Java code. You would need to translate your existing code to Java first to use it.
This would not be the same thing as writing an interpreter. Is this homework? Does the task say you have to write the interpreter or can you have the code run any way which works?
Unfortunately you did not mention which scripting language are you planning to support. If it is one of well known languages, just use its ready interpreter written in pure java. See BSF and Java 5 scripting (http://www.ibm.com/developerworks/java/library/j-javascripting1/)
It it is your own language
think twice: do you really need it?
If you are sure you need your own language think about JavaCC
First of all, thank you very much for the fast replies.
As part of our compiler project, we need to be able to compile and run a program written in our own specified language. The language is very similar to C. I am confused on how an interpreter works, is there a simpler way to implement this? Without generating byte codes? My idea was to translate each statement into Java equivalent statements, and let Java handle the byte code generation.
I would look into the topics mentioned. Again, thank you very much for the suggestions.