Java Obscured Obfuscation - java

Similar Questions: Here and Here
I guess the situation is pretty uncommon to begin with, and so I admit it is probably too localized for SO.
The Problem
public class bqf implements azj
{
...
public static float b = 0.0F;
...
public void b(...)
{
...
/* b, in both references below,
* is meant to be a class (in the
* default package)
*
* It is being obscured by field
* b on the right side of the
* expression.
*/
b var13 = b.a(var9, var2, new br());
...
}
}
The error is: cannot invoke a(aji, String, br) on primitive type float.
Compromisable limitations:
Field b cannot be renamed.
Class b cannot be renamed or refactored.
Why
I am modifying an obfuscated program. For irrelevant[?], unknown (to me), and uncompromisable reasons the modification must be done via patching the original jar with .class files. Hence, renaming the public field b or class b would require modifying much of the program. Because all of the classes are in the default package, refactoring class b would require me to modify every class which references b (much of the program). Nevertheless there is a substantial amount of modification I do intend on doing, and it is a pain to do it at the bytecode level; just not enough to warrant renaming/refactoring.
Possible Solutions
The most obvious one is to rename/refactor. There are thousands of classes, and every single one is in the default package. It seems like every java program I want to modify has that sort of obfuscation. : (
Anyways sometimes I do take the time to just go about manually renaming/refactoring the program. But when when there's too many errors (I once did 18,000), this is not a viable option.
The second obvious option is to do it in bytecode (via ASM). Sometimes this is ok, when the modifications are small or simple enough. Unfortunately doing bytecode modifications on only the files which I can't compile through java (which is most of them, but this is what I usually try to do) is painfully slow by comparison.
Sometimes I can extend class b, and use that in my modified class. Obviously this won't always work, for example when b is an enum. Unfortunately this means a lot of extra classes.
It may be possible to create a class with static wrapper methods to avoid obscurity. I just thought of this.
A tool which remaps all of the names (not deobfuscate, just unique names), then unmaps them after you make modifications. That would be sweet. I should make one if it doesn't exist.
The problem would also be solved with a way to force the java compiler to require the keyword "this".

b.a(var9, var2, new br());
can easily be rewritten using reflection:
Class.forName("b").getMethod("a", argTypes...).invoke(null, var9, var2, new br());
The problem would also be solved with a way to force the java compiler to require the keyword "this".
I don't think how this would help you for a static member. Compiler would have to require us to qualify everything—basically, disallow simple names altogether except for locals.

Write a helper method elsewhere that invokes b.a(). You can then call that.
Note: In Java the convention is that the class would be named B and not b(which goes for bqf and aqz too) and if that had been followed the problem would not have shown.
The real, long time cure, is not to put classes in the default package.

Related

When we change the implementation of a method, do we have to recompile dependent classes?

Let's say that we have the following method in class TaxCalculator:
public double calculateTax(double income) {
return income * 0.3;
}
And we use this method in the Main class like this:
var calculator = new TaxCalculator();
double tax = calculator.calculateTax(100_000);
System.out.println(tax);
If I change the implementation of the calculateTax method to:
public double calculateTax(double income) {
return income * 0.4;
}
Do I need to recompile both the TaxCalculator class and the Main class?
I know this question sounds very silly, but I heard in a lecture that if we don't use interfaces, every change we make in tightly-coupled code (like the one I showed above) will force us to recompile all the classes that depends on the class we made the change.
And this sounds weird to me, because the Main class doesn't know the implementation of the method we've made the change.
Thanks in advance for any help!
Yeah, that lecturer was just dead wrong. More generally, this is a very outdated notion that used to be a somewhat common refrain, and various lecturers still espouse it:
The idea that, if you have a class, you make a tandem interface that contains every (public) method in that class; given that the class already occupies the name of the concept, the interface can't be given a good name, and thus, an I is prefixed. You have a class Student and a matching interface IStudent.
Don't do that. It's a ton of typing, even if you use tools to auto-generate it, everytime you change one you are then forced to change the other, and there is no point to it.
There are exotic and mostly irrelevant ways in which you get a 'more tight coupling' between a user of the Student class and the Student class code itself vs. having that user use IStudent instead. It sounds like either you or more likely perhaps the lecturer is confused and presumed that this tight coupling implies that any change in Student.java would thus require a recompile.
Furthermore, if those examples are from the lecture, oh boy. double is absolutely not at all acceptable for financial anything. That should most likely be an int or long, representing cents (or whatever passes for 'atomic monetary unit' for the currency in question; pennies for pounds, satoshis for bitcoin, yen for yen, and so on). In rare cases, BigDecimal. In any case, not, ever, double or float.
You need to recompile B, where B uses something from A, if:
You change the value of a constant in A that is used directly used by B, and that constant was 'CTC' (Compile Time Constant). Only the primitives and strings can be CTC, and they are CTC if the field is static final, and is immediately initialized (vs. initialized in a separate static {} block), and whose expression is itself CTC, which means its comprised of literals and possibly simple operations between CTCs, e.g. in static final int a = 5; static final int b = a + 10;, b is also CTC. In contrast, e.g. static final long c = System.currentTimeMillis(); is not a compile time constant because System.currentTimeMillis() isn't, for obvious reasons.
You change a signature of any element in A that B uses. Even if the caller (B.java here) can be recompiled with zero changes. For example, you have in A.java: void foo(String param) and you change that void foo(Object param). Even though foo("hello") is a valid call to either method, you still need to recompile here. Relevant elements are the name of the method, the types of the parameters (not the names), and the return type. Changing the exceptions you throw is fine. Deleting something in A that B used is, naturally, also something that'd require a recompile.
And that's essentially it. The interjection of an interface doesn't meaningfully change this list - if that constant is in the interface, the same principle applies (if you change it, you have to recompile users of this constant), and if you change signatures, you'd have to change them in the interface as well, and we're back where we started.
Adding an interface does have some generally irrelevant bonuses.
As a caveat, any such attempt must always answer the rather poignant question of: But how do callers make an instance? If the lecturer uses IStudent student = new Student(); they messed that up, and the few mostly irrelevant benefits of using an interface are gone.
If there are meaningfully different implementations available (quick rule of thumb: If you can come up with good news for all relevant types, this is the case), using an interface is 'correct' and none of this applies. For example, java.util.List is the interface, java.util.LinkedList and java.util.ArrayList are meaningfully different implementations of the same idea.
It's slightly easier to make an implementation of the interface specifically for test purposes. However, mocks and extending the class are generalized solutions to this problem too and usually work just as well, and more generally making a test-specific impl requires more care than just a rote application of the 'make a mirroring interface' principle.
You get an extra level of access - you can have public things in the class that nevertheless aren't mirrored in the interface, and thus, are not 'accessible' via the interface. There is a single good reason to make things public when they aren't really meant for external consumption: When you have a multi-package system. Java's module system acknowledges this too, and (via the 'exported package' concept) also introduces, effectively, another access level (a public thing in a non-exported package is not accessible from other modules, it's not as public as a public thing in an exported package). This is outdated, and there are ways around it even in a multi-package library, and it doesn't actually stop much - you cannot enforce callers to 'code to the interface'1
Well, you can, but those are a bit clunky, and those would also stop another package in the same project, which was the whole point. You can use hacks to get around this, but if you're willing to use these hacks, you can just make those public, but not actually meant for external consumption non-public and use the same hackery.

What are 'real' and 'synthetic' Method parameters in Java?

Looking into j.l.r.Executable class I've found a method called hasRealParameterData() and from its name and code context I assume that it tells whether a particular method has 'real' or 'synthetic' params.
If I take e.g. method Object.wait(long, int) and call hasRealParameterData() it turns out that it returns false which is confusing to me, as the method is declared in Object class along with its params.
From this I've got a couple of questions:
What are 'real' and 'synthetic' Method parameters and why Java believes that params of Object.wait(long, int) are not 'real'?
How can I define a method with 'real' params?
Preamble - don't do this.
As I mentioned in the comments as well: This is a package private method. That means:
[A] It can change at any time, and code built based on assuming it is there will need continuous monitoring; any new java release means you may have to change things. You probably also need a framework if you want your code to be capable of running on multiple different VM versions. Maybe it'll never meaningfully change, but you have no guarantee so you're on the hook to investigate each and every JVM version released from here on out.
[B] It's undocumented by design. It may return weird things.
[C] The java module system restriction stuff is getting tighter every release; calling this method is hard, and will become harder over time.
Whatever made you think this method is the solution to some problem you're having - unlikely. If it does what you want at all, there are probably significantly better solutions available. I strongly advise you take one step backwards and ask a question about the problem you're trying to solve, instead of asking questions about this particular solution you've come up with.
Having gotten that out of the way...
Two different meanings
The problem here is that 'synthetic' means two utterly unrelated things and the docs are interchanging the meaning. The 4 unrelated meanings here are:
SYNTHETIC, the JVM flag. This term is in the JLS.
'real', a slang term used to indicate anything that is not marked with the JVM SYNTETHIC flag. This term is, as far as I know, not official. There isn't an official term other than simply 'not SYNTHETIC'.
Synthetic, as in, the parameter name (and other data not guaranteed to be available in class files) are synthesised.
Real, as in, not the previous bullet point's synthetic. The parameter is fully formed solely on the basis of what the class file contains.
The 'real' in hasRealParameterData is referring to the 4th bullet, not the second. But, all 4 bullet point meanings are used in various comments in the Executable.java source file!
The official meaning - the SYNTHETIC flag
The JVM has the notion of the synthetic flag.
This means it wasn't in the source code but javac had to make this element in order to make stuff work. This is done to paper over mismatches between java-the-language and java-the-VM-definition, as in, differences between .java and .class. Trivial example: At least until the nestmates concept, the notion of 'an inner class' simply does not exist at the class file level. There is simply no such thing. Instead, javac fakes it: It turns:
class Outer {
private static int foo() {
return 5;
}
class Inner {
void example() {
Outer.foo();
}
}
}
Into 2 seemingly unrelated classes, one named Outer, and one named Outer$Inner, literally like that. You can trivially observe this: Compile the above file and look at that - 2 class files, not one.
This leaves one problem: The JLS claims that inner classes get to call private members from their outer class. However, at the JVMS (class file) level, we turned these 2 classes into separate things, and thus, Outer$Inner cannot call foo. Now what? Well, javac generates a 'bridger' method. It basically compiles this instead:
class Outer {
private static int foo() {
return 5;
}
/* synthetic */ static int foo$() {
return foo();
}
}
class Outer$Inner {
private /* synthetic */ Outer enclosingInstance;
void example() {
Outer.foo$();
}
}
The JVM can generate fields, extra overload methods (for example, if you write class MyClass implements List<String> {}, you will write e.g. add(String x), but .add(Object x) still needs to exist to cater to erasure - that method is generated by javac, and will be marked with the SYNTHETIC modifier.
One effect of the SYNTHETIC modifier is that javac acts as if these methods do not exist. If you attempt to actually write Outer.foo$() in java code, it won't compile, javac will act as if the method does not exist. Even though it does. If you use bytebuddy or a hex editor to clear that flag in the class file, then javac will compile that code just fine.
generating parameter names
Weirdly, perhaps, in the original v1.0 Java Language Spec, parameter types were, obviously, a required part of a method's signature and are naturally encoded in class files. You can write this code: Integer.class.getMethods();, loop through until you find the static parseInt method, and then ask the j.l.r.Method instance about its parameter type, which will dutifully report: the first param's type is String. You can even ask it for its annotations.
But weirdly enough as per JLS 1.0 you cannot ask for its name - simply because it is not there, there was no actual need to know it, it does take up space, java wanted to be installed on tiny devices (I'm just guessing at the reasons here), so the info is not there. You can add it - as debug info, via the -g parameter, because having the names of things is convenient.
However, in later days this was deemed too annoying, and more recently compilers DO stuff the param name in a class file. Even if you do not use the -g param to 'include debug symbol info'.
Which leaves one final question: java17 can still load classes produced by javac 1.1. So what is it supposed to do when you ask for the name of param1 of such a method? The name simply cannot be figured out, it simply isn't there in the class file. It can fall back to looking at the debug symbol table (and it does), but if that isn't there - then you're just out of luck.
What the JVM does is make that name arg0, arg1, etc. You may have seen this in decompiler outputs.
THAT is what the hasRealParameterData() method is referring to as 'real' - arg0 is 'synthesized', and in contrast, foo (the actual name of the param) is 'real'.
So how would one have a method that has 'real' data in that sense (the 4th bullet)? Simply compile it, it's quite hard to convince a modern java compiler to strip all param names. Some obfuscators do this. You can compile with a really old -target and definitely don't add -g, and you'll probably get non-real, as per hasRealParameterData().

Comparing two .jars with different obfuscation

I need to compare to jar files that have many of the same classes but with different names.
Lets say you are looking for a class that contains this:
public class AStar {
private int verbose = 0;
private int maxSteps = -1;
private int numSearchSteps;
public ISearchNode bestNodeAfterSearch;
etc..., but it's obfuscated into
public class ard {
private int fas = 0;
private int asd = -1;
private int ags;
public ars arser;
and you have to compare the first file against 100 of others to find this one.
My guess was a byte code comparison, but I can't find a tool for it or a method to compare all files against each other in the two jars.
I've done this in the past, but the problem is that generally a lot of manual work is also required to determine the type of information that is preserved, and which libraries to compare it with.
For example, in one case, I found that the obfuscated Jar had added a method to a library class which threw off the comparison until I found and accounted for it. Another common problem is that obfuscators will remove unused methods and interfaces and sometimes add obfuscator-specific methods.
In order to get good results, you can't just consider individual classes. You need to match up inheritance hierarchies, interfaces, and cross references between the classes in order to unambiguously match most classes, and even then it isn't always successful.
Luckily, they almost never reorder or change the signatures of the fields and methods. Otherwise it would be extremely difficult to collect enough information to unambiguously match up the classes. As it is, there are often classes with the exact same set of methods and inheritance (for example two classes that implement the same interface). If you're lucky, you'll be able to infer it by matching references from a third class, but this isn't always possible.
Anyway, I can send you my code if you want. It's designed for the recognition of open source libraries included in an obfuscated app, but it could probably be adapted to match two obfuscated apps as well.
You should be able to pull this off with ASM. It has pretty good documentation, and quite some samples.
You build an internal model from the types and values, and then compare and spit out the identical classes.
If it was you who obfuscated it, you should be able to get the mappings though...
In the general case, determining whether two arbitrary programs do the same thing for all inputs is undecidable (reducible to the halting problem).
For the following, I'll assume the obfuscation doesn't mess with the class structure: it will only rename fields, methods and classes and possibly obfuscate bytecode.
Let's say you're looking for an obfuscated class that's equivalent to some class C. Here are some searches you could perform, in increasing order of difficulty:
Find all classes with the exact same number of fields and methods as C has.
For each obfuscated class, compute the set of field types it contains (but, for simplicity, don't include types that point to other obfuscated classes). All classes where this set of field types is not a subset of the field types of C can be filtered out.
Do the same for method signatures.
You could go further but it could get pretty complicated.
In the end, what works best depends on what specific things the obfuscator does and does not try to hide.
ASM is a good library for parsing and processing .class files.
If the obfuscation changed only variable names, and not variable order or any of the compiler-generated bytecode, you should be able to do this with ASM or Javassist or other bytecode library. In fact, the list below can be done using regular Java reflection.
Two class files would be candidates for equality if:
They have the same number of methods
There is a 1-to-1 mapping between the parameter signatures of the methods in class A and class B
The matching method also match in terms of flags (private/public, static, abstract, etc.)
That would be a pretty good match. Beyond that and you might have to get into the details of the byte code. The byte code should be similar, but references to the Const Pool might be scrambled. You would have to decipher those. For example one class might ldc #12 and the other might ldc #34; if it turns out that #12 in class A is the same as #34 in class B, they match (at least for that).
If the obfuscator rewires the order of parameters on private methods, it might be really hard to detect a match easily. Still, maybe all you need to do is to narrow it down to a reasonable number of candidates, so applying the list above to public and protected methods might be all you need.
I use Beyond Compare to compare jar files:
http://www.scootersoftware.com/
You may have some luck using their additional file formats to compare .class files (decompiled)
http://www.scootersoftware.com/download.php?zz=kb_moreformats_win

why MyClass.class exists in java and MyField.field isn't?

Let's say I have:
class A {
Integer b;
void c() {}
}
Why does Java have this syntax: A.class, and doesn't have a syntax like this: b.field, c.method?
Is there any use that is so common for class literals?
The A.class syntax looks like a field access, but in fact it is a result of a special syntax rule in a context where normal field access is simply not allowed; i.e. where A is a class name.
Here is what the grammar in the JLS says:
Primary:
ParExpression
NonWildcardTypeArguments (
ExplicitGenericInvocationSuffix | this Arguments)
this [Arguments]
super SuperSuffix
Literal
new Creator
Identifier { . Identifier }[ IdentifierSuffix]
BasicType {[]} .class
void.class
Note that there is no equivalent syntax for field or method.
(Aside: The grammar allows b.field, but the JLS states that b.field means the contents of a field named "field" ... and it is a compilation error if no such field exists. Ditto for c.method, with the addition that a field c must exist. So neither of these constructs mean what you want them to mean ... )
Why does this limitation exist? Well, I guess because the Java language designers did not see the need to clutter up the language syntax / semantics to support convenient access to the Field and Method objects. (See * below for some of the problems of changing Java to allow what you want.)
Java reflection is not designed to be easy to use. In Java, it is best practice use static typing where possible. It is more efficient, and less fragile. Limit your use of reflection to the few cases where static typing simply won't work.
This may irk you if you are used to programming to a language where everything is dynamic. But you are better off not fighting it.
Is there any use that is so common for class literals?
I guess, the main reason they supported this for classes is that it avoids programs calling Class.forName("some horrible string") each time you need to do something reflectively. You could call it a compromise / small concession to usability for reflection.
I guess the other reason is that the <type>.class syntax didn't break anything, because class was already a keyword. (IIRC, the syntax was added in Java 1.1.)
* If the language designers tried to retrofit support for this kind of thing there would be all sorts of problems:
The changes would introduce ambiguities into the language, making compilation and other parser-dependent tasks harder.
The changes would undoubtedly break existing code, whether or not method and field were turned into keywords.
You cannot treat b.field as an implicit object attribute, because it doesn't apply to objects. Rather b.field would need to apply to field / attribute identifiers. But unless we make field a reserved word, we have the anomalous situation that you can create a field called field but you cannot refer to it in Java sourcecode.
For c.method, there is the problem that there can be multiple visible methods called c. A second issue that if there is a field called c and a method called c, then c.method could be a reference to an field called method on the object referred to by the c field.
I take it you want this info for logging and such. It is most unfortunate that such information is not available although the compiler has full access to such information.
One with a little creativity you can get the information using reflection. I can't provide any examples for asthere are little requirements to follow and I'm not in the mood to completely waste my time :)
I'm not sure if I fully understand your question. You are being unclear in what you mean by A.class syntax. You can use the reflections API to get the class from a given object by:
A a = new A()
Class c = a.getClass()
or
Class c = A.class;
Then do some things using c.
The reflections API is mostly used for debugging tools, since Java has support for polymorphism, you can always know the actual Class of an object at runtime, so the reflections API was developed to help debug problems (sub-class given, when super-class behavior is expected, etc.).
The reason there is no b.field or c.method, is because they have no meaning and no functional purpose in Java. You cannot create a reference to a method, and a field cannot change its type at runtime, these things are set at compile-time. Java is a very rigid language, without much in the way of runtime-flexibility (unless you use dynamic class loading, but even then you need some information on the loaded objects). If you have come from a flexible language like Ruby or Javascript, then you might find Java a little controlling for your tastes.
However, having the compiler help you figure our potential problems in your code is very helpful.
In java, Not everything is an object.
You can have
A a = new A()
Class cls = a.getClass()
or directly from the class
A.class
With this you get the object for the class.
With reflection you can get methods and fields but this gets complicated. Since not everything is an object. This is not a language like Scala or Ruby where everything is an object.
Reflection tutorial : http://download.oracle.com/javase/tutorial/reflect/index.html
BTW: You did not specify the public/private/protected , so by default your things are declared package private. This is package level protected access http://download.oracle.com/javase/tutorial/java/javaOO/accesscontrol.html

Anonymous class binary names

I have the following problem:
1) There is some abstract class A with several anonymous subclasses stored in the static fields of A. There is circular dependency between two of the anonymous subclasses. The code of that abstract class is similar to following:
class A implements Serializable
{
public static final A _1 = new A() {
public A foo()
{
return _2;
}
};
public static final A _2 = new A() {
public A foo()
{
return _1;
}
};
public static final A _3 = new A() {
public void bar()
{
// do something
}
};
}
2) Instances of class A is referenced by other objects which are used in serialization. There are some objects which are pre-serialized by developers and then included into release as binary data.
After some refactoring of A class binary names of anonymous subclasses was changed in the release builds. I think this may be due to difference of java compiler versions. From .class files made on my machine I can see that anonymous subclasses of A stored in _1, _2 and _3 fields have names A$1, A$2 and A$3, respectively, but from .class files taken from release build I can see that anonymous subclasses of A stored in _1, _2 and _3 fields have names A$2, A$3 and A$1, respectively. Due to this pre-serialized data became unusable and I need to fix this somehow.
Are there any specifications for java compilers or JVM which will say what binary names I should expect for my anonymous classes? The JLS says that name of anonymous class should be name of enclosing class, "$"-sign and non-empty sequence of digits without setting any constraints on these sequences.
I believe that I shouldn't rely on internal names of anonymous classes, I also know "proper" ways to fix that problem like generating pre-serialized data on the build server. Too bad we don't have much time for this now, so I want to know from where this naming difference comes, so I could fix this issue now.
May I dare to challenge some elements ? Hopefully it can be useful to you :
if you want your classes to have a well-known name ... well, anonymous is the contrary of a named class ! ;-)
preserializing and delivering objects as binary data is a dangerous choice, and you got bitten by it (during a refactoring, but I believe that could happen in many other conditions). Serialized data is usually considered as a short term solution in Java, good for a few seconds. Many other options are available for longer term storage.
Now, if asked to solve your short-term problem, the only approach I see is to restore your classes to a state compatible with the previous version. If the different ordering you mention is the only difference, I believe that defining the anonymous classes in the same order as before is worth trying ! Also take care that references should be backwards (to a class earlier in the file), not forward (to a class later in the file).
The only reason I can guess why it fails is that the new Java version reorders the class names because you reference _2 in _1. That said, I don't think you can rely on the names since Java makes no guarantees in which order it will process fields of a class (and therefore, the sequence in which it will create inner classes).
But I think your problem is somewhere else. What error do you get?
Did your compiler not give any warnings?
I believe you can read the data without relying on the anonymous class names in the current code by overriding ObjectInputStream.readClassDescriptor. Replace with a descriptor of a "compatible" class. No guarantees that will work, but may be worth a try if your data is important.

Categories

Resources