I am trying to develop a tool that basically analyses an Android app APK and counts the number of calls to a specific API method (e.g., android.app.AlarmManager.set())
1. What approach do you recommend?
So far I have used APKTool and now I have.smali files.
However, for the same java source, I can have multiple files:
ExportAsyncTask$1.smali
ExportAsyncTask$2.smali
ExportAsyncTask$3.smali
ExportAsyncTask$4.smali
2. What do these multiple files mean?
3. The resulting .smali files also include external libraries that I would like to leave out the analysis. How can I do that?
1. What approach do you recommend?
Yes, Apktool can be used for your task. Each java class in the APK will be represented by a .smali file in directories tree representing the packages. Smali - is the Android Virtual Machine language. The language is much simpler than Java and hence easier for analysis. In your case you should search for invoke opcodes and Landroid/app/AlarmManager;->set strings. If you are working in Linux, you can for example grep and count them. In Windows you have text editors like Notepad++ that allow text search in multiple files.
2. What do these multiple files mean?
In Java there are internal classes and implicit internal classes. The former will appear in OutherClass$InnerClass.smali and the later, having no name of its own, get numbers, like OuterClass$1.smali. You sometimes even get more deep levels like a$b$c$1.smali.
3. The resulting .smali files also include external libraries that I would like to leave out the analysis. How can I do that?
You have no precise way to do that. But generally speaking, when you look at the packages/directory tree of many samples you usually grasp the pattern. E.g. usually application code is in com.* package and android.*, org.* and uk.* include libraries. After such inspection you simply exclude those directories from your search.
Related
I am generating Java code from Haxe code and I want to strip down the generated java files.
This basically means, that I want to delete specific functions from them. It is the same function from all java files.
And I want to do that everytime after I compile the Haxe files. So I need automation. I looking into sed, but I am not even sure it can be done with it. I would to find the end of the function somehow.
Or does anyone know another tool suited for this?
Get hold of one of the bytecode manipulation package, such as ASM. Read the docs and sample programs. Write a program that automates the code modification you're interested in.
Do you know of a C++ library (open source, or free for non-commercial use) that can parse Java source codes, for example from a jar file or defined classpath? I want to extract classes, class members, methods, method calls and relations between these artifacts.
I've spent all day googling for a solution. Either I'm blind, or can't read! :)
You can't get source codes from a jar file, since that is really a set of (binary) class files. Assuming you means the source codes that might have been used to produce a jar file, then there's a decent answer.
If you want an open source solution, you can try ANTLR, which has a Java 1.5 grammar and AFAIK will build AST. From that you can "extract" the trees for the items you want, or at least the line numbers for the subtree of interest; from there, you can extract the code you want.
I believe ANTLR can be configured to produce a C++-based parser.
To capture relations between these, you need full name and type resolution, so you know which definition an identifier actually references. For this, ANTLR being just a parser won't do the trick; you need to live a Life After Parsing.
An alternative might be the Java compiler; it offers some kind of API.
There are a number of decompilers available for Java. These aren't based on C++ necessarily, but they can convert Java classes and libraries back into source.
Examples: JD Core, DJ Java Decompiler. more
I'm writing an applet, which uses ~10 external libraries. Together they occupy more than 2 megabytes. In some libs we use only 1-2 classes, so a lot of others can be safely deleted. So the question is how to remove unused classes from jar libraries?
A lot of other questions link to Proguard. But it doesn't process libraries (or I am doing something wrong) and also ruins parts of code which use reflection.
You could use the maven-shade-plugin and tell it to build a minimized jar file that combines your code and libs.
You could use something like ClassDep, which statically identifies which classes you will use.
However it's possible to easily fool this. Imagine some of your code contains:
Class.forName(className);
so you can dynamically build a classname and load that class. Tools like ClassDep can't identify these cases, so you'd need to perform comprehensive testing on your shrunken jars.
ProGuard can process your code together with the libraries (with the option -injars). You can still keep external libraries that you don't want to process (with the option -libraryjars).
Any automatic shrinking tool will have problems with reflection. ProGuard recognizes some basic reflection and it allows you to specify the parts of the internal API that should be preserved for the sake of reflection. ProGuard supports some powerful configuration, but depending on the amount of reflection in the libraries, it may still require trial and error.
You can simply "unzip" the JAR's, take only the classes you want from each, and place them in a custom archive. Brian A. gave a good suggestion on how to identify those classes and some caveats. I would add they you may be violating licenses as well...
I'm thinking about trying to convert a Scons (Python) script to another build system but was wondering if there was a Python-analysis library available in order to 'interrogate' the Scons/Python script?
What I'm [possibly] after is something along the lines of Java's reflection mechanism, in fact, if this is possible via say Jython/Java, coding in Java, that would be best for me as a Java dev (I have no real background in Python).
What I need to be able to do is extract the variable assigment values etc. for certain named class types and methods within the script, so that I can transfer them to my new output format.
Any ideas?
Thanks
Rich
If your current scons files are very regular and consistent it may be easier to do something "dumb" with standard text-editing tools. If you want to get smarter, you should notice that scons is itself a Python program, and it loads your build files which are also Python. So you could make your own "special" version of scons which implements the functions your build scripts use (to add programs, libraries, whatever). Then you could run your build scripts in your "fake" scons program and have your functions dump their arguments in a format suitable for your new build system.
In other words, don't think of the problem in terms of analyzing the Python grammar completely--realize that you can actually run your build scripts as Python code and hijack their behavior.
Easier said than done, I'm sure.
I doubt it's the best tool for migrating scons, but python's inspect module offers some reflection facilities. For the rest, you can simply poke inside live classes and objects: Python has some data hiding but does not enforce access restrictions.
So, I've been programming for a while now, but since I haven't worked on many larger, modular projects, I haven't come across this issue before.
I know what a .dll is in C++, and how they are used. But every time I've seen similar things in Java, they've always been packaged with source code. For instance, what would I do if I wanted to give a Java library to someone else, but not expose the source code? Instead of the source, I would just give a library as well as a Javadoc, or something along those lines, with the public methods/functions, to another programmer who could then implement them in their own Java code.
For instance, if I wanted to create a SAX parser that could be "borrowed" by another programmer, but (for some reason--can't think of one in this specific example lol) I don't want to expose my source. Maybe there's a login involved that I don't want exploited--I don't know.
But what would be the Java way of doing this? With C++, .dll files make it much easier, but I have never run into a Java equivalent so far. (I'm pretty new to Java, and a pretty new "real-world" programmer, in general as well)
Java .jar library is the Java equivalent of .dll, and it also has "Jar hell", which is the Java version of "dll hell"
http://en.wikipedia.org/wiki/JAR_(file_format)
Google JAR files.
Edit: Wikipedia sums it up nicely: http://en.wikipedia.org/wiki/JAR_%28file_format%29
Software developers generally use .jar files to distribute Java applications or libraries...
A jar is just a uncompressed zip of your classes. All classes can be easily decompiled and viewed. If you really don't want to share your code, you might want to look at obfuscating your code.
The Java analog to a DLL is the .jar file, which is a zip file containing a bunch of Java .class files and (perhaps) other resources. See Sun's, er, Oracle's documentation.
Java's simple moto 'Write Once, Run anywhere'. create your all java classes as jar file but there are possibilities that still some one can see the Java code by using Decompilers. To prevent someone really looking at your code then Obfuscate the jar using the below link.
Java Obfuscation
You could publish a collection of compiled *.class files.
The most common way to package up Java code is to use a ".jar" file. A .jar file is basically just a .zip file.
To distribute just your compiled code, you'll want to build a .jar that contains your .class files. If you want to additionally distribute the source code, you can include the .java files in a separate area of the .jar.
There are a lot of tools and tutorials out there that explain how to build a .jar.
Technically, you can compile Java bytecode down to native code and create a conventional DLL or shared library using an Ahead-Of-Time compiler.
However, that DLL would need the Java runtime specific to the AOT compiler, and two Java runtimes may not coexist in one process. Also, one would have to employ JNI to make any use of that DLL.
Unfortunately, obfuscation has too many weaknesses...
your tittle doesn't match your comment....
simple have a source jar and a code jar. but, as other people pointed out you can obfuscate the code if you don't want people to read it, it's a pain for other people using your library as they would need the mappings in order to compile and the obfuscator.
A dll is a shared library (from what I read gets instantiated one time across multiple processes)
A jar is a shared library (code gets instantiated per process from the same file)
So to answer your title question there doesn't appear to be one built into java. A library could be made and then supported on all 3 major os's to have a dll equivalent version in java. But, the reason why java made it a new instance per program is for security / sanity reasons. there are custom class loaders, asm and reflection that other programs can modify the classes on load. So if your program does any of these things it could mess up other processes.
You don't have to distribute your source code. You can distribute compiled .class files, which contain human-unreadable bytecode. You can bundle them into .jar files, which are just zip files, and are roughly Java equivalent of native .dll files.
Note taht .class files can be easily decompiled (although decompilers cannot recover 100% of information from sources). To make decompilation more difficult, you can use obfuscator to make sources much less legible.