Is there a way to compile to hide the source code? - java

Using Play or Grails or any other JVM framework;
Is there a way (or what is the way) to full compile the generated war/jar files so that the source code can be hidden, without the possibility of decompiling?
Or even after compilation, is it possible to easily decompile and get strings and classes? e.g. db connection et al.
Thank you.

No, you cannot compile anything without the possibility of decompiling. That said, you can do some things to make the process more costly.
The real trick is to make the costs low to you and expensive to others. In short, expect to pay more in time / money / inconvenience and realize that you have just made the challenge harder, in one way (that possibly might become easy to circumvent). But, look on the bright side, the entire software industry has gotten along just fine without absolute protections against decompiling.
Sign and seal your JAR files. This prevents people from adding things to your JAR files and prevents people from replacing parts of your code (to get a better understanding of the operating program).
Consider a class / method name obfuscator. This will rename your classes and method names into an equivalent structure that contains small names like "a.a(..)" instead of "Client.connect(...)". This makes it harder for others to read your code (and others includes yourself in this case, so if you intend to debug, this increases your cost to support the code). Oh, and this breaks any reflection, so you must provide work-arounds and fixes for reflection.
If you provide any kind of decent logging, you also need to obfuscate the logging, otherwise one need only read the log messages emitted from a class to figure out that class "h" is the DatabaseConnection, class "k" is the "User" data object, etc.
Embedded strings in your classes can always be extracted. So, if you want to protect them, you must embed "scrambled" strings, and "descramble" them prior to use. Doing so has a CPU overhead, and as soon as the "descrambling" routing is known, the entire process can be circumvented.
Exotic solutions exist, like rewriting your code into equivalent code which performs similar operations. The problem is that for the end deliverable to be useful, it still must perform identically to the original, yet now to debug the output isn't even following the original code.
Often one wants to protect the ability to solve the problem, not really the source code. Keep this in mind, by delivering something that works, often copying the already-compiled elements is enough to breach the "this code is mine" mindset. If you really want control over you code, don't release it, set up a server and offer the software solution "as a service" on your own hardware.

What you looking for is called obfuscation. There are several popular byte code obfuscators for java.

Do a quick search for grails or groovy code obfuscators and it should generate a bunch of results. It's fairly easy to decompile afterwards if you know what you're doing. There's no foolproof way.

Related

Decompiling a jar file and modifying the source to hack an application. How to prevent this? [duplicate]

How can I package my Java application into an executable jar that cannot be decompiled (for example , by Jadclipse)?
You can't. If the JRE can run it, an application can de-compile it.
The best you can hope for is to make it very hard to read (replace all symbols with combinations of 'l' and '1' and 'O' and '0', put in lots of useless code and so on). You'd be surprised how unreadable you can make code, even with a relatively dumb translation tool.
This is called obfuscation and, while not perfect, it's sometimes adequate.
Remember, you can't stop the determined hacker any more than the determined burglar. What you're trying to do is make things very hard for the casual attacker. When presented with the symbols O001l1ll10O, O001llll10O, OO01l1ll10O, O0Ol11ll10O and O001l1ll1OO, and code that doesn't seem to do anything useful, most people will just give up.
First you can't avoid people reverse engineering your code. The JVM bytecode has to be plain to be executed and there are several programs to reverse engineer it (same applies to .NET CLR). You can only make it more and more difficult to raise the barrier (i.e. cost) to see and understand your code.
Usual way is to obfuscate the source with some tool. Classes, methods and fields are renamed throughout the codebase, even with invalid identifiers if you choose to, making the code next to impossible to comprehend. I had good results with JODE in the past. After obfuscating use a decompiler to see what your code looks like...
Next to obfuscation you can encrypt your class files (all but a small starter class) with some method and use a custom class loader to decrypt them. Unfortunately the class loader class can't be encrypted itself, so people might figure out the decryption algorithm by reading the decompiled code of your class loader. But the window to attack your code got smaller. Again this does not prevent people from seeing your code, just makes it harder for the casual attacker.
You could also try to convert the Java application to some windows EXE which would hide the clue that it's Java at all (to some degree) or really compile into machine code, depending on your need of JVM features. (I did not try this.)
GCJ is a free tool that can compile to either bytecode or native code. Keeping in mind, that does sort of defeat the purpose of Java.
A little late I know, but the answer is no.
Even if you write in C and compile to native code, there are dissasemblers / debuggers which will allow people to step through your code. Granted - debugging optimized code without symbolic information is a pain - but it can be done, I've had to do it on occasion.
There are steps that you can take to make this harder - e.g. on windows you can call the IsDebuggerPresent API in a loop to see if somebody is debugging your process, and if yes and it is a release build - terminate the process. Of course a sufficiently determined attacker could intercept your call to IsDebuggerPresent and always return false.
There are a whole variety of techniques that have cropped up - people who want to protect something and people who are out to crack it wide open, it is a veritable arms race! Once you go down this path - you will have to constantly keep updating/upgrading your defenses, there is no stopping.
This not my practical solution but , here i think good collection or resource and tutorials for making it happen to highest level of satisfaction.
A suggestion from this website (oracle community)
(clean way), Obfuscate your code, there are many open source and free
obfuscator tools, here is a simple list of them : [Open source
obfuscators list] .
These tools make your code unreadable( though still you can decompile
it) by changing names. this is the most common way to protect your
code.
2.(Not so clean way) If you have a specific target platform (like windows) or you can have different versions for different platforms,
you can write a sophisticated part of your algorithms in a low level
language like C (which is very hard to decompile and understand) and
use it as a native library in you java application. it is not clean,
because many of us use java for it's cross-platform abilities, and
this method fades that ability.
and this one below a step by step follow :
ProtectYourJavaCode
Enjoy!
Keep your solutions added we need this more.

What is the limitations and advantages of Cross Compiler for C# to java?

I want to migrate my entire C# 4.0(.Net 2010) desktop Application to Java.I don't know any tool available for that?Please suggest me good one.
Also, i would like to know what are the limitations and advantages of Cross Compiler for C# to java?
please guide me to get out of this problem...
Saravanan.P
Crosscompilers will usually produce rather messy code, and sometimes code that doesn't even compile.
Some (maybe most) will force your new code into having bindings with custom libraries from the crosscompiler, and thus be forever linked to that product.
Your new code will be very hard to maintain and expand as a result, and might well offer poor performance as well as compared to the old code when compiled.
In general, you would most likely be better off rewriting the application yourself (or hiring people to do so) if it is going to have to be used and maintained actively for more than a short, transitional period.
That said, for some things a crosscompiler can be helpful. For example start with a crosscompiled version and over time replace that codebase with newly written code, this would get you working more quickly and you'd not have to maintain 2 separate code bases, in 2 different languages, using 2 toolsets, at the same time.

What are the pro and cons of having localization files vs hard coded variables in source code?

Definitions:
Files:
Having the localization phrases stored in a physical file that gets read at application start-up and the phrases are stored in the memory to be accessed via util-methods. The phrases are stored in key-value format. One file per language.
Variables:
The localization texts are stored as hard code variables in the application's source code. The variables are complex data types and depending on the current language, the appropriate phrase is returned.
Background:
The application is a Java Servlet and the developers use Eclipse as their primary IDE.
Some brief pro and cons:
Since Eclipse is use, tracking and finding unused localizations are easier when they are saved as variables, compared to having them in a file. However the application's source code becomes bigger and bloated.
What are the pro and cons of having localization text in files versus hard coded varibles in source code? What do you do and why?
Update 1: In my specific case, recompiling and deploying is not an issue, since it is done since we have test-phases that gives us a chance to find typos, etc. Because of that we rarely need to change the phrases once the application is on production.
Having the localization of app in its source code has many disadvantages - probably the biggest being, that when you want to fix/add/remove some localization, you have to recompile it and redistribute a new version. With separate files, the updates are more flexible, faster and easier to maintain and of course others can add localization to your app without the need to have access to the code.
So i would recommend going with the separate files option, not hardcoding it into the app.
I think there is no choice between "Files" and "Variables". Because it should be always "Files".
pros:
a) Easy to maintain - The entire localization is in a single file.
b) No recompile required when there is a change.
c) Easy to introduce another language.
Usually translators are
non-technical people.
Not necessary to change in
multiple places.
If your project will be used by hundreds of people, localization is worth it because odds are some of them are more familiar with another language. If this project is only for internal use, then hardcoded variables are okay. If the number of users is below 100, the tasks of finding translators and maintaining each localization file are too cumbersome.
If you are going to have localized versions of your program, then most likely you'll need a translator at some point. If you do not have localization files, you have to give the translator access to every source file with string resources (do you even know which they are?) and then you'll need to modify the source manually, which is prone for error.
For large software companies, the localization process has traditionally been performed by a separate group from the development team (often in a different country). The localization group would often work with binaries - hence the requirement for resource bundles to be separate artefacts. One of the goals of the process is to not let translators break source code; another is to not let programmers break strings.
From your comments, it sounds like you have tools to do some alternative form of automated String extraction and insertion. This might be a viable alternative if done right.
Questions I'd be asking about your translation process:
Do you have tools to automate resource extraction and file rewriting?
Any cutting and pasting of strings (especially strings you can't understand) is likely to be a vector for bugs.
Can these tools distinguish between resource strings and untranslatable strings?
Are you giving translators a standard file format they have the tools to work with?
Do you have tools/processes for String recovery from previous versions?
Is the size of the source code increasing significantly because you're adding translations?
When it comes to application logic, the strings are clutter you don't want to see and there's something to be said for separating logic and presentation data
Is this approach going to scale if you add more languages?
Are you introducing any encoding, keyboard or font issues?
There's not much point hard-coding strings if your editors break them
You can overcome the lack of character support using \uXXXX escaping
Long-lived products can accumulate unused resources, but I've never seen it become a practical problem. If you feel strongly about it, you can probably detect unused keys by scanning the sources.

Writing a Java standard class library alternative from the scratch

I am just curious but I want to know if it is feasible to remove totally the Java standard class libraries coming with the JVM and start a new one from the scratch [à la ClassPath].
If that is possible, what classes MUST be implemented as minimum? (Object and String come to my mind, but... I do not know).
Such thing breaks some license? Is there any way to say to the "java" command to "not use the rt.jar"?
Thanks in advance,
Ernesto
You can use the -Xbootclasspath option to specify your own set of core classes.
If you do go down this path, you will probably end up with a lot of problems if you intend to also use third party libraries as they will depend on the core API and any inconsistencies in your version will likely cause bugs.
As an absolute minimum you'd probably have to reimplement everything in the java.lang package. As well as Object and String, the primitive wrapper classes need to be present in order for auto-boxing to work. I don't think you can replace java.lang without a fair bit of native code to make things like threads work.
In theory, "yes" it is possible, though you might also need to implement your own JVM! The relationships between the JVM and some of the low level classes (Object, Class, Thread, etc) are such that these classes have to be implemented as part of the JVM.
In practice, it is such a big task that you'd be working on it for the rest of your life, and the chances are that nobody would use your code even if you succeeded. That doesn't sound like "fun" to me.
Such thing breaks some license?
Not per-say. But if you ever tried to release it calling it "Java", Sun's lawyers would be after you for trademark infringement. You can only legally call your implementation Java if it has been validated against the Sun TCK.
But I don't want to be totally discouraging. If you want to hack on the internals of a JVM or stuff like that, the JNode project is always looking for keen new people.
No, it is not feasible at all. I mean, sure, you could do it, but you aren't going to do it better than a large corporation or open source project with years of experience and large numbers of Java gurus. It might be fun to geek it up though.

Creating non-reverse-engineerable Java programs

Is there a way to deploy a Java program in a format that is not reverse-engineerable?
I know how to convert my application into an executable JAR file, but I want to make sure that the code cannot be reverse engineered, or at least, not easily.
Obfuscation of the source code doesn't count... it makes it harder to understand the code, but does not hide it.
A related question is How to lock compiled Java classes to prevent decompilation?
Once I've completed the program, I would still have access to the original source, so maintaining the application would not be the problem. If the application is distributed, I would not want any of the users to be able to decompile it. Obfuscation does not achieve this as the users would still be able to decompile it, and while they would have difficulty following the action flows, they would be able to see the code, and potentially take information out of it.
What I'm concerned about is if there is any information in the code relating to remote access. There is a host to which the application connects using a user-id and password provided by the user. Is there a way to hide the host's address from the user, if that address is located inside the source code?
The short answer is "No, it does not exist".
Reverse engineering is a process that does not imply to look at the code at all. It's basically trying to understand the underlying mechanisms and then mimic them. For example, that's how JScript appears from MS labs, by copying Netscape's JavaScript behavior, without having access to the code. The copy was so perfect that even the bugs were copied.
You could obfuscate your JAR file with YGuard. It doesn't obfuscate your source code, but the compiled classes, so there is no problem about maintaining the code later.
If you want to hide some string, you could encrypt it, making it harder to get it through looking at the source code (it is even better if you obfuscate the JAR file).
If you know which platforms you are targeting, get something that compiles your Java into native code, such as Excelsior JET or GCJ.
Short of that, you're never going to be able to hide the source code, since the user always has your bytecode and can Jad it.
You're writing in a language that has introspection as part of the core language. It generates .class files whose specifications are widely known (thus enabling other vendors to produce clean-room implementations of Java compilers and interpreters).
This means there are publicly-available decompilers. All it takes is a few Google searches, and you have some Java code that does the same thing as yours. Just without the comments, and some of the variable names (but the function names stay the same).
Really, obfuscation is about all you can get (though the decompiled code will already be slightly obfuscated) without going to C or some other fully-compiled language, anyway.
Don't use an interpreted language? What are you trying to protect anyway? If it's valuable enough, anything can be reverse engineered. The chances of someone caring enough to reverse engineer most projects is minimal. Obfuscation provides at least a minimal hurdle.
Ensure that your intellectual property (IP) is protected via other mechanisms. Particularly for security code, it's important that people be able to inspect implementations, so that the security is in the algorithm, not in the source.
I'm tempted to ask why you'd want to do this, but I'll leave that alone...
The problem I see is that the JVM, like the CLR, needs to be able to intrepert you code in order to JIT compile and run it. You can make it more "complex" but given that the spec for bytecode is rather well documented, and exists at a much higher level than something like the x86 assembler spec, it's unlikely you can "hide" the process-flow, since it's got to be there for the program to work in the first place.
Make it into a web service. Then you are the only one that can see the source code.
It can't be done.
Anything that can be compiled can be de-compiled. The very best you can do is obfuscate the hell out of it.
That being said, there is some interesting stuff happening in Quantum Cryptography. Essentially, any attempt to read the message changes it. I don't know if this could be applied to source code or not.
Even if you compile the code into native machine language, there are all sorts of programs that let you essentially decompile it into assembly language and follow the process flow (OlyDbg, IDA Pro).
It can not be done. This is not a Java problem. Any language that can be compiled can be decompiled for Java, it's just easier.
You are trying to show somebody a picture without actually showing them. It is not possible. You also can not hide your host even if you hide at the application level. Someone can still grap it via Wireshark or any other network sniffer.
As someone said above, reverse engineering could always decompile your executable. The only way to protect your source code(or algorithm) is not to distribute your executable.
separate your application into a server code and a client app, hide the important part of your algorithm in your server code and run it in a cloud server, just distribute the client code which works only as a data getter and senter.
By this even your client code is decompiled. You are not losing anything.
But for sure this will decrease the performance and user convenience.
I think this may not be the answer you are looking for, but just to raise different idea of protecting source code.
With anything interpreted at some point it has to be processed "in the clear". The string would show up clear as day once the code is run through JAD. You could deploy an encryption key with your app or do a basic ceasar cipher to encrypt the host connect info and decrypt at runtime...
But at some point during processing the host connection information must be put in the clear in order for your app to connect to the host...
So you could statically hide it, but you can't hide it during runtime if they running a debugger
This is impossible. The CPU will have to execute your program, i.e. your program must be in a format that a CPU can understand. CPUs are much dumber than humans. Ergo, if a CPU can understand your program, a human can.
Having concerns about concealing the code, I'd run ProGuard anyway.

Categories

Resources