Java decompiler. How work? [duplicate]

Java decompiler. How work? [duplicate] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
So is a decompiler really a thing that gives gives the source of a compiled/interpreted piece of code? Because to me that sounds impossible. How would you get the names of the functions, variables, classes, etc if it is compiled. Or am I misinterpreting the definition? How does it work? And what is the general principal behind making one?

You're right about your definition of a decompiler: it takes a compiled application and produces source code to match. However, it does not in most cases know the name and structure of variables/functions/classes--it just guesses. It analyzes the flow of the program and tries to find a way to represent that flow through a certain programming language, typically C. However, because the programming language of choice (C, in this example) is often at a higher level than the state of the underlying program (a binary executable), some parts of the program might be impossible to represent accurately; in this case, the decompiler would fail and you would need to use a disassembler. This is why many people like to obfuscate their code: it makes it much harder for decompilers to open it.
Building a decompiler is not a simple task. Basically, you have to take the application that you are decompiling (be it an executable or some other form of compiled application) and parse it into some kind of tree you can work with in memory. You would then analyze the flow of the program and try to find patters that might suggest that an if statement/variable/function/etc was used in a certain location in the code. It's all really just a guessing game: you'd have to know the patterns that the compiler makes in compiled code, then search for those patterns and replace them with equivalent human-readable source code.
This is all much simpler for higher-level programs like Java or .NET, where you don't have to deal with assembly instructions, and things like variables are mostly taken care of for you. There, you don't have to guess as much as just directly translate. You might not have exact variable/method names, but you can at least deduce the program structure fairly easily.
Disclaimer: I have never written a decompiler and thus don't know every detail of what I'm talking about. If you are really interested in writing a decompiler, you should get a book on the topic.

A decompiler basically takes the machine code and reverts it back to the language it was formatted in. If I'm not mistaken, I think the decompiler needs to know what language it was compiled in, otherwise it won't work.
The basic purpose of the decompiler is to get back to your source code; for example, one time my Java file got corrupted and the only thing I could so to bring it back was by using a decompiler (since the class file wasn't corrupted).

It works by deducing a "reasonable" (based on some heuristics) representation of what's in the object code. The degree of resemblance between what it produces and what was originally there tends to depend heavily upon how much information is contained in binary it starts from. If you start with basically a "pure" binary, it's generally stuck with just making up "reasonable" names for the variables, such as using things like i, j and k for loop indexes, and longer names for most others.
On the other hand, a language that supports introspection needs to embed a great deal more information about variable names, types, etc., into the executable. In a case like this, decompiling can produce something much closer to the original, such as typically retaining the original names for functions, variables, etc. In such a case, the decompiler can often produce something quite similar to the original -- possibly losing little more than formatting and comments.

That depends on what language you are decompiling. If you are decompiling something like C or C++, then the only information provided to you is function names and arguments (In DLLs). If you are dealing with java, then the compiler usually inserts line numbers, variable names, field and method names, and so on. If there are no variable names, then you would get names like localInt1, localInt2, localException1. Or whatever the compiler is. And it can tell the spacing between lines, because of the line numbers.

Related

Why do I get a $ sign in decompiled java classes full with errors? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I decompiled my lost apk and got all the resources and all java classes, but the problem is that I got all the .java files with errors and with dollar signs? I know these are anonymous java classes, but how to fix these errors? I can't find a way to do that... Please help

What you're seeing is Java's internal representation of the anonymous inner classes. Java implements these by creating classes with generated names, which -- like all inner classes -- are based on adding $ and a suffix to the name of the containing class. (There are some other changes made to support the inner class's ability to refer back to its containing context.)
Apparently whichever decompiler you used didn't attempt to reverse that rewrite. That isn't very surprising; I haven't yet seen one that did handle this situation correctly, and some don't even handle constructors correctly (leaving them with their generated function name, <init>(). Unfortunately, compilation of any language always involves discarding some information, and decompilation will generally not be able to reconstruct the original code -- and may not be able to reconstruct syntactically correct code, since the object code is generally allowed to do things that the source language can't. You should expect to have to manually edit the decompiler's output.
If you're just trying to get the code running, you may be able to do so by replacing the generated class and function names with ones which are acceptable to Java syntax (as opposed to the less-restricted JRE). If you actually want to turn the generated classes back into anonymous inner classes, you'll have to do that manually as well.
Or you can try to find a decompiler which is better at handling this case and isn't worse at handling other cases. Good luck; if you do find one, let us know.
(The real answer here is to be extremely careful not to lose your source code. The best thing that can be said about decompilers is that they're usually better than trying to directly read the instructions.)

How to inject bytecode to a compiled java program without using tools? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I want to learn how I can create an injectable piece of java bytecode, and inject it into an already compiled java program so it will run when the said program is executed.
It doesn't have to be dynamic and in runtime, just given a compiled java program inject additional code into it.
Now, I know there are many existing tools for this, like Javassist and ASM. But the act itself isn't my goal, I want to learn how its done, so I want to learn how to do this without these tools.
For example: How to strip excess code from the source bytecode, where to inject it into the target code, etc.
The best answer would be one or more simple pieces of source or pseudo-code.
After learning and successfully doing this I'm going to start searching info on how to do this to Linux executable binaries, so adding in more information on that way would also be very helpful and appreciated.

First off, Java classfiles are essentially immutable once loaded, so what you're really asking is how to create and modify classfiles by hand.
The answer is to read the JVM specification. That's how I got started with bytecode. After reading the specs, I wrote a couple simple classfiles by hand in a hex editor and played around with it to see how things worked. Of course, that's not practical for normal usage, so I later wrote an assembler. It's not that hard.
Incidentally, the source code for my assembler is only around 1k lines of code, so it's a lot less to sort through than Javaassist.

Javassist is essentially decompiling and compiling the code, that is why there is a lot of code there.
And you won't find the type code injection you are looking for in javassist. So "Go read javassist" is a rather stupid suggestion.
If you want to put your code into an specific place(for instance in the start or constructor)
you can see how to find the spot by reading JVM docs.
However as Antimony mentioned, you are looking for bytecode knowledge, so here it is:
http://arhipov.blogspot.com.au/2011/01/java-bytecode-fundamentals.html
If you want to inject a piece of bytecode,you can just find the start of your main() and paste the code there. It will be 200-300 LOC MAX.
With linux binaries it is much easier, read this:
http://www.skyfree.org/linux/references/ELF_Format.pdf

Typically, this is done by reading the bytecode (or Linux executable), transforming it into some form of Intermediate Representation (IR), perform additional transformation on the IR, and convert it back into the original format.
If you transform the IR back into Java source code, you will get a decompiler for Java. If you perform analysis on the IR, manipulate it (addition, removal, rewriting, etc.) and convert the transformed IR back to the bytecode format, you will get what you described.
For detailed algorithms on how to convert bytecode to an IR, you can refer to Section 3.3 of http://suif.stanford.edu/~jwhaley/papers/mastersthesis.pdf and Section 3 of https://courses.cs.washington.edu/courses/cse501/01wi/project/sable-thesis.pdf

What is a popular JavaDoc practice for ASCII-art documentation? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I'm working on a project written in Java, designed to transmit data via a messaging system that strictly defines the bit position of the messages fields. This means we have an entire library of dictionary classes designed to bit-shift object input data to/from the message binary representation. This library is reasonably large, and because the protocol is still young, has the tendency to be tweaked and changed every year or so.
The JavaDoc for this library provides ASCII art tables and diagrams that explain what a particular method expects as input (or output). These tables are exceedingly important because finding the documentation and verifying that the method actually does what the document says can be time consuming a prone to error. Following a single, simple ASCII-representation of the bit shifting makes this a lot easier.
I have a coworker who insists that ASCII art does not belong in JavaDoc (even with tags), and furthermore that we configure Eclipse to automatically format the code on save. He offers two options to reformat the documentation:
Embed an image.
Use an HTML table.
The image would be okay, except Eclipse doesn't render SVG images. It is completely unacceptable to me that we maintain an SVG image and then export the image as PNG to our documentation repo, and then link the PNG with HTML. The amount of maintenance involved in that scenario seems completely crazy. Who is responsible for making sure all the PNG, SVG, and code are synchronized?? Furthermore, obviously, the data won't be readable without the image.
The HTML table option is bad for two reasons. First, the Eclipse formatter puts each tag and value on it's own line, which means every single value takes up three lines. It leaves huge gaps in the source code, and is completely unreadable without rendering the HTML. To make matters worse, some of our tables are complex, and troubleshooting HTML tables is not my idea of a responsible thing to require of developers who already resist creating documentation.
So if my coworker is right about "java people" not using ASCII diagrams for documentation, what is a standard, industry practice, that gives us a method for preserving these diagrams? How does this method benefit over using tags with ASCII diagrams? Bonus points if you can answer why JavaDoc hasn't evolved to provide readable markup, instead of relying on HTML.
Edit: I just found markdown-doclet. I don't know if this will be an acceptable compromise or not. Maybe there are other tools that work similarly?

An old question, but I have had similar frustrations.
You can use the /*- construct to prevent Eclipse from formatting a given comment. See: https://stackoverflow.com/a/5466173.
Use the {#code} construct and/or <pre>. See: https://stackoverflow.com/a/542142. I suppose someone who argues against ASCII diagrams in general would argue against these, too. But perhaps it's having been enshrined in Javadoc syntax will be a point in your favor.
Point out that even the Java developers use ASCII diagrams where appropriate.
You could also tell your fellow to use a better editor. No, no, I troll (: ...A little.

At the company we've decided on ASCII diagrams for the primary reasons given already, and we believe they are more than enough to justify this choice:
Maintenance cost and feasibility. I've seen projects with outdated external resources... it's almost inevitable.
Displayed anywhere (IDE, text editor). We don't produce Javadoc for internal projects and put them on a web server. Development habits have changed... see http://www.flowstopper.org/2014/12/graphical-visualizations-in-javadoc.html
ASCII diagrams also force one to keep it simple, which usually helps for clarity.
I've found http://www.asciidraw.com/ to be a great tool for this purpose.

Is Java Code obfuscation actually effective vs decompilers? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am curious enough to considering not evening writing certain code in Java because of how easy it is to decompile. Is there a way that I can write in Java and not have to worry about decompilers? I understand anything can be reversed engineered given enough time, so what I am asking is: are Java class obfuscators effective enough to deterrent decompliation?

are Java class obfuscators effective enough to deterrent decompliation?
I would say "no". When I decompile source code with the intent of trying to figure out how someone did something, I already know what I'm looking for. So I don't have to understand the entire program -- just the one piece that's of interest to me at the time. With enough puzzling over methods and backtracking a bit up the call chain, it's usually possible to determine what's under the hood without an excessive amount of effort.

If your question is Can I ensure that no one can hack my code , the answer would be NO..
Whether it is in JAVA or Visual C++ .
As long as your software which is made up of byes or bits is directly accessible by the hacker.
The REASON is simple.
However you encoded , that can be decoded.
The best strategy could be to make a web service and deploy your secret logic there.
Let others use your service without having access to how you wrote.

Obfuscation, in Java and other languages, is just a deterrent. It simply raises the bar for the attacker. That doesn't mean obfuscation has no value, it just isn't a guarantee.
What are you trying to protect and what type of market are you targeting ?
Obfuscation to protect a license algorithm in a market that it full of pirating isn't going to mean that much. However, for SMB, it may be a enough to cut out most of the casual pirates.
If you are trying to protect IP from competition, I see two answers. The idea, will be hard to protect. A capable engineer looking at the code will figure out the gems of the logic and be able to reimplement. Obfuscation will make it a lot harder for people to just pick up the code and include it in their own product. The maintenance costs will continue to grow as they attempt to make changes (I'd say that is also true for cleanly decompiled code).
The java products I develop for my company are obfuscated. Have they protected us from theft...I doubt it. But, in the context of our development costs, the obfuscation wasn't that expensive. A small bit of protection for a small price isn't a bad trade-off.

From personal experience decompiling Java, I will say that obfuscation can make someone's attempts to decompile very very irritating and difficult. The most irritating to me is when the final builds class files are all named "a.class, b.class, c.class" and so on, and a large amount of dummies are thrown in. In terms of in code obfuscation, try/catches do a fine job of messing stuff up for the decompiler.
In general, anything you decompile will not be compilable, but will give you hints as to the general workings of the program.

"Effective enough" depends entirely on how effective you need it to be. And that depends on what you are protecting, and from whom. None of the conventional methods (obfuscation, encrypting the bytecodes, compiling to an "exe") will stop a skilled and determined attacker with enough time and incentive. But that pretty much applies to all forms of programming. (You can disassemble or decompile C/C++ apps as well ...)
The only way you can protect against a serious reverse engineering effort is to use a secure execution platform; e.g. using something based on TPM. Even then, if the bad guys can attach a logic analyser to a system running your code, they can (in theory) capture the native code being executed and then start on the reverse engineering path.
EDIT : Someone has reportedly succeeded in breaking a popular TPM chip, using an electron microscope; see this Register article. And interestingly, his original motivation was to hack Xbox 360 consoles!

Frankly speaking No. No matter how ridiculously you obfuscate the code, if someone knows he can make a million dollar out of your code, he will decompile your class files and get the code.
There are alternatives though:
Convert your java program to exe beofre distributing. You must know that there are catches here.
Encrypt you class files with a key. Make a custom classloader that can decode the class files using the private key before loading it into memory. There are two problems here, a) load time increases, b) how will you hide the private key.

if you read my post https://stackoverflow.com/a/26717791/2132826 you will see that I couldn't find one good java de-obfuscator that actually works as expected.
so the current answer is: NO.

Does a Java to C++ converter/tool exist? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I always asked myself if it would be possible to make a Java to C++ converter.
Maybe a tool that converts the Java syntax to the C++ syntax?
I am aware that the languages differ, but simple things like loops where the semantics match 1 to 1.
Is there such a tool? Or is it possible to make one?

It's possible to do anything given enough time, money and resources. Is it practical? Beyond trivial examples not really. Or rather it depends on what constitutes an acceptable error rate.
The real problem is that the idioms are different in Java to C++. Java to C# for example would actually be far easier (because the idioms are much more similar). The biggest of course is that C++ has destructors and manually managed memory. Java uses finally blocks for this kind of behaviour and has garbage collection.
Also Java has a common Object supertype. C++ doesn't.
The generics to templates would be nigh on impossible I would imagine.

The Firefox HTML5 parser is written in Java and converted to C++. But I think the converter used there is quite specific for this project. Interestingly, it turned out the resulting C++ parser was faster than the old parser written in C++.
I'm also writing a converter as part of the H2 database, under src/tools/org/h2/java. The idea is to allow converting a subset of the H2 database to C++, so this is also not a general purpose translater.
And there is the open source project J2C.
So there are ways to convert Java to C++. But don't expect the translator support all features, and don't expect the resulting code to be any faster than a good Java JVM.

Is is possible, no question, but it won't be so simple. It would be a Java compiler which generates C++.
If you want to do that from scratch, it will be very hard, you have to do all the work javac and the JVM do for you (e.g. garbage collection).
Btw. Google has a Java to JavaScript compiler (included in GWT)

There is one, bit I am not sure if it actually works.
Java to C++ Converter-Tangible Software Soulutions.
It is weird how there are c++ to java converters, but only 1 java to c++ converter.

As said it would be tough to convert Java to C++ but we can have an applicaiton or tool that generates code in Java and equivalnet C++ code.
I know one applicaiton which generates code in C++/Java/C# given a model which has its own way to deifine it.
That tool belongs to CA and name is CA Plex.
Search on www.ca.com

There are programs out there that claim they can do this, but none have gained enough popularity to be frequently mentioned, so we'll leave them at "attempts". Making a converter would require a lot of AI built into your program. The difficulty is increased tenfold when swing is involved because GTK/wxWidgets/Qt/win32 API all differ greatly from swing. But it is possible. Not that the code quality will be great, and no guarantees your program won't crash due to separate memory handling methods, but it's possible.

The main issue is that java is a language that is written and designed to talk to a VM. I suppose it would be possible, but all you would be left is a very poorly optimized application with a self translating layer doing what the VM already does. I mean, sure, it is possible, it still wouldn't be a solution for anything i could think of. If your looking to make your sluggish java app native, maybe your thinking too hard, just use an application like JET, its actually quite good, and will give you the benefits a native app would bring. Of course if the VM is already doing what the app is asking it to do just as well as native code could(it happens.. sometimes :P) it might change nothing.
Java to c#, tho, sounds more reasonable, as both the languages are written in similar ways, talking to a framework as such, but this would still leave code very much unoptimized as code written from scratch for a particular framework can not be bested.

http://www.tangiblesoftwaresolutions.com/Order/Order_Upgrade_Instant_CPlus_Java_Edition.htm
Depends on the domain of where the code will be used, from a learning perspective perhaps it might be interesting.
i just found this via a google as I remembered seeing one in Univeristy that created code based on uml.

Java to C would actually be the easiest. Remember you need to convert the language, If you do that, the required libraries can be converted by your new compiler. In other words Swing and AWT should not be a big problem...
I would start by taking a good look at the Java Native Interface (JNI). The JNI is a part of java which allows it to be used with C and C++. The reason I would start here is that it becomes fairly obvious how parts of Java may be implemented in C. Once I had a grasp on basic structures, like how Java Objects can be mapped onto C structures (struct) and how pretty much everything in Java is an Object including arrays, I might peek at the Open JDK source code.
The actual converter would have to convert all the imported Java libraries (and their imported libraries and so on...) which means you would need the source code for everything. This conversion no small task since the Java libraries are large.
The process would be time consuming, but no AI should be required. However, I see no reason to perform a conversion like this. It looses the portability of Java and would not gain the efficiency of C (except that it would be compiled to native code, but it would be better to compile the machine code directly from the Java).

Something neat would be a tool , that translate java to "C++ using Java API" (like GNU GCJ CNI), one problem remain is to manage array.length (array not vector) ...

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.