I got the Java classes from an APK after using some tools like dex2jar and JD-GUI. As everybody knows Java byte code can be converted to Java classes back so mostly it is optimized and obfuscated through some tools (like ProGuard is used in the case of Android) to make it secure from others. So what I got is obfuscated code and I want to make it error-free, readable, understandable so that I can further modify it for my own purpose (for my personal use only, I don't mean to violate any copyrights). So any help i.e advices, tools, helping material to make this obfuscated code much closer to what was written by a developer or to make it error-free and understandable will help me a lot. Currently my focus is about to reversing obfuscating techniques used by ProGuard like when I tried reverse engineering on my own projects and found that:
int resource values can be altered with ids by matching through the R file which is generated with reverse engineering.
The if/else conditions mostly converted to while(true) and some continues and breaks.
Inner classes mostly broke up to separate files
So, any other techniques and helping material for the above mentioned ways which can describe how to properly reverse them will be very helpful.
There isn't a magical tool that will refactor obfuscated code into a buildable project. Most likely, you won't be able to decompile and de-obfuscate an APK to be clean and maintainable code. This is a good thing.
There are tools which are better than dex2jar and jd-gui. One of them is apk-deguard, which claims to reverse the process of obfuscation. From their about page:
DeGuard
DeGuard (http://www.apk-deguard.com) is a novel system for statistical
deobfuscation of Android APKs, developed at the Software Reliability
Lab, ETH Zurich, the same group which developed the widely used JSNice
system. Similarly to JSNice, DeGuard is based on powerful
probabilistic graphical models learned from thousands of open source
programs. Using these models, DeGuard recovers important information
in Android APKs, including method and class names as well as
third-party libraries. DeGuard can reveal string decoders and classes
that handle sensitive data in Android malware.
You should use Enjarify, which is owned by Google, instead of dex2jar. Also, apktool is good for decompiling an APK's resources, which is not handled by dex2jar and enjarify.
Other tools include jadx, procyon, fernflower, show-java, smali/baksmali.
You will need a good IDE for refactoring. JEB looks like a good tool for refactoring. This is a paid tool mostly used by Android security researchers.
This should help:
DeObfuscator
Reverse engineering is a difficult task (i would say subtle art), mostly hit and miss, especially with obfuscated code, what you can do is to focus in some special function, that seems pretty obvious and start from there, renaming and refactoring classes, also a good IDE may help you a lot (my personal recommendation: NetBeans).
Related
I have been working on a project alone for more than two years for a company. The project is a really big one using rxtx to communicate with a hardware device. I used Java 8 and JAVAFX for the UI. Now it is almost finished and I am starting to search how to deliver the end user application that the company will distribute over its clients.
The problem is that the company I am working with wants the code to be non reachable when the software is between final clients hands because the Java code contains some extremely sensitive information that could have very bad consequences for the company if final clients happened to know them. The clients can literally perform actions they don’t have the right to perform.
So after searching (a lot) and thinking relatively to my case, I understood that giving a JAR obfuscated isn’t the solution. I then tried to generate a JAR and then transform it to an EXE but all I succeeded on was wrapping the JAR into EXE which does not prevent extracting the JAR and then seeing all the code easily. Finally, I found that I should use AoT compilation like GCJ compiler to produce native binary exe from my Java code but here I am stuck because after watching videos and reading articles etc I didn’t manage to find a clear way to produce the native binary exe.
I am now confused since I don’t know if I am on the right path and good direction or if I am totally wrong and there is another way of protecting the code (at least from non professional hackers, I understand that it is not possible to make it 100% safe but I am just searching for a reasonable and good way). How should I manage this final step of my work?
I currently work for a company that has code that we don't want anyone to have access to for the security of our clients and-- less important-- for legal reasons. ;-)
One possible solution you could look into would be to rewrite the code you deem most sensitive into a C/C++ library. It would be possible to compile this into a .so/.dll/.dylib file for the respective OSs and it would make it difficult, not entirely impossible, but difficult to decompile.
The trouble would come from learning how to access native code from Java as much of the documentation is not helpful or just simply nonexistent. This would utilize the Java Native Interface (JNI) which allows Java to, well, interface with the native (compiled C/C++) code. This would make it possible to create a Jar file that would effectively become a Java library for you to access throughout the rest of your project. The native code, however will still need to be loaded at runtime, but that's apart of learning how JNI works. A helpful link I found for JNI is http://jnicookbook.owsiak.org/ (for as long as it's still a functional link).
One of our clients here where I work has a project written in Java and needed to implement our code that is unfortunately all written in C. So we needed a way to access this C/C++ code from Java. This is the way we went about solving this issue without rewriting our code in Java. But we had the benefit (?) of having already written our code in C.
This solution to write a bunch of extra code last minute in another language that I may or may not be familiar with doesn't sound like particularly fun time.
I would be curious to learn what possible problems others might see with this solution.
How can I package my Java application into an executable jar that cannot be decompiled (for example , by Jadclipse)?
You can't. If the JRE can run it, an application can de-compile it.
The best you can hope for is to make it very hard to read (replace all symbols with combinations of 'l' and '1' and 'O' and '0', put in lots of useless code and so on). You'd be surprised how unreadable you can make code, even with a relatively dumb translation tool.
This is called obfuscation and, while not perfect, it's sometimes adequate.
Remember, you can't stop the determined hacker any more than the determined burglar. What you're trying to do is make things very hard for the casual attacker. When presented with the symbols O001l1ll10O, O001llll10O, OO01l1ll10O, O0Ol11ll10O and O001l1ll1OO, and code that doesn't seem to do anything useful, most people will just give up.
First you can't avoid people reverse engineering your code. The JVM bytecode has to be plain to be executed and there are several programs to reverse engineer it (same applies to .NET CLR). You can only make it more and more difficult to raise the barrier (i.e. cost) to see and understand your code.
Usual way is to obfuscate the source with some tool. Classes, methods and fields are renamed throughout the codebase, even with invalid identifiers if you choose to, making the code next to impossible to comprehend. I had good results with JODE in the past. After obfuscating use a decompiler to see what your code looks like...
Next to obfuscation you can encrypt your class files (all but a small starter class) with some method and use a custom class loader to decrypt them. Unfortunately the class loader class can't be encrypted itself, so people might figure out the decryption algorithm by reading the decompiled code of your class loader. But the window to attack your code got smaller. Again this does not prevent people from seeing your code, just makes it harder for the casual attacker.
You could also try to convert the Java application to some windows EXE which would hide the clue that it's Java at all (to some degree) or really compile into machine code, depending on your need of JVM features. (I did not try this.)
GCJ is a free tool that can compile to either bytecode or native code. Keeping in mind, that does sort of defeat the purpose of Java.
A little late I know, but the answer is no.
Even if you write in C and compile to native code, there are dissasemblers / debuggers which will allow people to step through your code. Granted - debugging optimized code without symbolic information is a pain - but it can be done, I've had to do it on occasion.
There are steps that you can take to make this harder - e.g. on windows you can call the IsDebuggerPresent API in a loop to see if somebody is debugging your process, and if yes and it is a release build - terminate the process. Of course a sufficiently determined attacker could intercept your call to IsDebuggerPresent and always return false.
There are a whole variety of techniques that have cropped up - people who want to protect something and people who are out to crack it wide open, it is a veritable arms race! Once you go down this path - you will have to constantly keep updating/upgrading your defenses, there is no stopping.
This not my practical solution but , here i think good collection or resource and tutorials for making it happen to highest level of satisfaction.
A suggestion from this website (oracle community)
(clean way), Obfuscate your code, there are many open source and free
obfuscator tools, here is a simple list of them : [Open source
obfuscators list] .
These tools make your code unreadable( though still you can decompile
it) by changing names. this is the most common way to protect your
code.
2.(Not so clean way) If you have a specific target platform (like windows) or you can have different versions for different platforms,
you can write a sophisticated part of your algorithms in a low level
language like C (which is very hard to decompile and understand) and
use it as a native library in you java application. it is not clean,
because many of us use java for it's cross-platform abilities, and
this method fades that ability.
and this one below a step by step follow :
ProtectYourJavaCode
Enjoy!
Keep your solutions added we need this more.
We inherited some leagcy code that has a whole lot of code copy/pasted across projects. Is there a way to find these? PMD can do a single project
Summary
There is also CloneDetective, Simian and Simscan. This paper from the International Conference on Software Engineering 2009 compares them, and PMD's CPD.
In detail
One tool that can handle several languages is CloneDetective (based on ConQuat, Continuous Quality Assessment Toolkit): ABAP, ADA, Java, C#, C/C++, Visual Basic, Cobol, PL1.
Another tool is Simian, the Similarity Analyser, which identifies duplication in Java, C#, C, C++, COBOL, Ruby, JSP, ASP, HTML, XML, Visual Basic, Groovy source code and even plain text files. It runs on JVM and .NET.
Actually, if you look at .NET, there are a lot of copy paste detection tools...
SimScan, the SimilarityScanner is an Eclipse/IDEA/JBUILDER plugin that finds duplicated or similar fragments of code in large Java source code bases. I don't know it, and have no idea what "similar fragments" means. It sounds like it might also just look isolatedly in single projects, but the IntelliJ-Screenshots look nifty.
This paper from the International Conference on Software Engineering 2009 compares CloneDetective, PMD's CPD, Simian and Simscan.
Just as PMD's copy & paste finder is actually called CPD for "copy paste detector", using that term as the terminus technicus for googling helps. Another term often used is "clone detection".
You could try using the command line version of PMD CPD:
http://pmd.sourceforge.net/cpd.html
You should be able to specify multiple source trees to check.
Simian, which is the other prominent copy/paste detector has similar command line capabilities.
See our Java CloneDR, a tool for finding duplicated code across large sets of code.
CloneDR finds exact and near-miss clones using the structure of the code (abstract syntax trees) as a guide, so it isn't confused by whitespace or comment changes. For detected clones, it shows you the clone instances, and a parameterized generalization that you can use as the basis of replacement abstraction (in Java, that's pretty much done by making a method; other languages have other techniques).
Another poster references a technical paper comparing clone detectors. If you examine the paper, reference number [1] is to CloneDR. The authors of that paper do not compare their detector against CloneDR, as their detector only uses tokens, not the more sophisticated method CloneDR has that uses language structure.
CloneDR works for a variety of languages: Java, C#, C++, COBOL, JavaScript, PHP, many others.
To handle multiple projects, you just tell CloneDR the set of files in all the projects.
If you can put those projects into one Eclipse workspace, Codepro Analytix will happily consume all of them together: https://code.google.com/javadevtools/codepro/doc/index.html
Sonar is pretty neat to do this kind of thing. I really like all the indicators you can have...
If you are looking for an Eclipse plugin, checkout UCDetector: Unnecessary Code Detector
Basically, I do lots of one-off code generation, large-scale refactorings, etc. etc. in Java.
My tool language of choice is Python, but I'll take whatever solutions you can offer.
Here is a simplified illustration of what I would like, in a pseudocode
Generating an implementation for an interface
search within my project:
for each Interface as iName:
write class(name=iName+"Impl", implements=iName)
search within the body of iName:
for each Method as mName:
write method(name=mName, body="// TODO implement this...")
Basically, the tool I'm searching for would allow me to:
parse files according to their Java structure ("search for interfaces")
search for words contextualized by language elements and types ("variables of type SomeClass", "doStuff() method calls on SomeClass instances")
to run searches with structural context ("within the body of the current result")
easily replace or generate code (with helpers to generate, as above, or functions for replacing, "rename the interface to Foo", "insert the line Blah.Blah()", etc.)
The point is, I don't want to spend a lot of time writing these things, as they are usually throwaway. But sometimes I need something just a little smarter than what grep offers. It wouldn't be too hard to write up a simplistic version of this, but if I'm going to use something like this at all, I'd expect it to be robust.
Any suggestions of a tool/library that will help me accomplish this?
Edit to add some clarification
Python is definitely not necessary; I'll take whatever is that. I merely suggest it incase there are choices.
This is to be used in combination with IDE refactoring; sometimes it just doesn't do everything I want.
In instances where I'm using for code generation (as above), it's for augmenting the output of other code generators. e.g. a library we use outputs a tonne of interfaces, and we need to make standard implementations of each one to mesh it to our codebase.
First, I am not aware of any tool or libraries implemented in Python that specifically designed for refactoring Java code, and a Google search did not give me any leads.
Second, I would posit that writing such a decent tool or library for refactoring Java in Python would be a large task. You would have to implement a Java compiler front-end (lexer/parser, AST builder and type analyser) in Python, then figure out how to integrate this with a program editor. I'm not surprised that nobody has done this ... given that mature alternatives already exist.
Thirdly, doing refactoring without a full analysis of the source code (but uses pattern matching for example) will be incapable of doing complex refactoring, and will is likely to make mistakes in edge cases that the implementor did not think of. I expect that is the level at which the OP is currently operating ...
Given that bleak outlook, what are the alternatives:
One alternative is to use one of the existing Java IDEs (e.g. NetBeans, Eclipse, IDEA. etc) as a refactoring tool. The OP won't be able to extend the capabilities of such a tool in Python code, but the chances are that he won't really need to. I expect that at least one of these IDEs does 95% of what he needs, and (if he is realistic) that should be good enough. Especially when you consider that IDEs have lots of incidental features that help make refactoring easier; e.g. structured editing, undo/redo, incremental compilation, intelligent code completion, intelligent searching, type and call hierarchy views, and so on.
(Aside ... if existing IDEs are not good enough (#WizardOfOdds - only the OP can make that call!!), it would make more sense to try to extend the refactoring capability of an existing IDE than start again in a different implementation language.)
Depending on what he is actually doing, model-driven code generation may be another alternative. For instance, if the refactoring is happening because he is frequently creating and recreating his object model(s), then an alternative is to code the models in some modeling language and generate his code from those models. My tool of choice when doing this kind of thing is Eclipse EMF and related technologies. The EMF technologies include generation of editors, XML serialization, persistence, queries, model to model transformation and so on. I have used EMF to implement and roll out projects with object models consisting of 50 to 100 distinct classes with complex relationships and validation requirements. EMF's support for merging source code edits when you regenerate from an updated model is a key feature.
If you are coding in Java, I strongly recommend that you use NetBeans IDE. It has this kind of refactoring support builtin. Eclipse also supports this kind of thing (although I prefer NetBeans). Both projects are open source, so if you want to see how they perform this refactoring, you can look at their source code.
Java has its fair share of criticism these days but in the area of tooling - it isn't justified.
We are spoiled for choice; Eclipse, Netbeans, Intellij are the big three IDEs. All of them offer excellent levels of searching and Refactoring. Eclipse has the edge on Netbeans I think and Intellij is often ahead of Eclipse
You can also use static analysis tools such as FindBugs, CheckTyle etc to find issues - i.e. excessively long methods and classes, overly complex code.
If you really want to leverage your Python skills - take a look at Jython. Its a Python interpreter written in Java.
HI All I am at the end of the release of my project.So in order to keep working our manager asked us to generate Class Diagrams for the code we had written.Its medium project with 3500 java files .So I think we need to generate class diagrams.First I need to know how reverse engineering works here. Also I looked for some tools in Google(Green, Violet) but not sure
whether they are of any help.Please suggest me how to proceed.Also a good beginning tutorial is appreciated.
I strongly recommend BOUML. Its Java reverse support is absolutely ROCK SOLID.
BOUML has many other advanteges:
it is extremely fast (fastest UML tool ever created, check out benchmarks),
has rock solid C++, Java, PHP and others import support,
it is multiplatform (Linux, Windows, other OSes),
has a great SVG export support, which is important, because viewing large graphs in vector format, which scales fast in e.g. Firefox, is very convenient (you can quickly switch between "birds eye" view and class detail view),
it is full featured, impressively intensively developed (look at development history, it's hard to believe that such fast progress is possible).
supports plugins, has modular architecture (this allows user contributions, looks like BOUML community is forming up)
The tool you want to use is Doxygen. It's similar to Javadoc, but works across multiple languages. If figures out the dependencies, and can call graphviz to render the class diagrams. Here's an example of a few Java classes run through Doxygen.
This is more a toolchain than a tool and I haven't tried it out myself. But it maybe a starting point. Using UMLGraph, ant and GraphViz. Explained step by step: in this article.
I ve used Visual Paradigm for UML for what you want to do and it was quite good.
See here for details.
Just go Tools -> Instant reverse and select your packages.
You may be able to reverse engineer class diagrams with the open source modelleing tool ArgoUML http://argouml.tigris.org/
ObjectAid is pretty nice. You can drag classes into a diagram and arrange them the way you want.
Visual Paradigm for UML Standard Edition (or Better) will reverse engineer Java files in to Class Diagrams.
I guess if your boss just wants to keep you busy until the next project starts then there's no harm in it, but you will find pretty quickly that creating a class diagram with 3500 classes will tell you exactly NOTHING about your system. In fact, you don't really want a diagram with more than about 10 classes on it. So once you have reversed all the code into your modelling tool, you will want to start organizing and arranging to find the meaning. Create a new diagram, drop a single important class onto it and bring in all the classes that are directly related to that class. Repeat for maybe the 300 most significant classes. Don't worry, it isn't as horrible as it sounds, maybe a week's work.
For the record, my modelling tool of choice is Enterprise Architect by Sparx Systems. It will reverse java sources or .jar files. There is a free 30 day trial edition.
There are some tools available that will help you generate these diagrams. These cost money.
Otherwise you could to try to parse your Java files. This could be as simple to create a simple parser that reads the Java files and writes the name of the class and all the import statements to a file and generates a class diagram from there, graphviz can help you there.
I've been using Enterprise Architect for a number of years. A JBoss developer suggested it to me. It works very well for all types of UML modeling including the reverse engineering of class models (Java, C# and others). The basic version is currently $120 per seat, but it has most of the capabilities of much more expensive tools and it is much easier to learn. I particularly like its ability to generate HTML and RTF documentation.
It is very easy to synchronize code between the tool and your source code. Even bi-directional if you want.
Your PM may also like the activity and sequence diagrams that it can create. I also frequently use the deployment diagrams. It's very helpful to have all of this in one tool.