Do you know of a C++ library (open source, or free for non-commercial use) that can parse Java source codes, for example from a jar file or defined classpath? I want to extract classes, class members, methods, method calls and relations between these artifacts.
I've spent all day googling for a solution. Either I'm blind, or can't read! :)
You can't get source codes from a jar file, since that is really a set of (binary) class files. Assuming you means the source codes that might have been used to produce a jar file, then there's a decent answer.
If you want an open source solution, you can try ANTLR, which has a Java 1.5 grammar and AFAIK will build AST. From that you can "extract" the trees for the items you want, or at least the line numbers for the subtree of interest; from there, you can extract the code you want.
I believe ANTLR can be configured to produce a C++-based parser.
To capture relations between these, you need full name and type resolution, so you know which definition an identifier actually references. For this, ANTLR being just a parser won't do the trick; you need to live a Life After Parsing.
An alternative might be the Java compiler; it offers some kind of API.
There are a number of decompilers available for Java. These aren't based on C++ necessarily, but they can convert Java classes and libraries back into source.
Examples: JD Core, DJ Java Decompiler. more
Related
I was wondering, whether it is possible to take the antlr grammar (*.g) or the generated parsers (from this grammar) and use it in a separate project?
For this I was looking into the SysMLv2 (eclipse-based) project on github, where xtext was used in order to define the grammar of this new modelling language. The grammar and the generated parsers can be found here.
My first idea was just to take the grammar file (InternalAlf.g) and use antlr (i tried 3.5.0 and 3.5.2) in order to generate the parser + lexer. Doing this i end up with a bunch of error message that symbols were not found (the symbol in question: EObject).
Then since it is obviously an eclipse project i figured another naive solution would be to package the whole project as a jar and include it as library in mine. I tried to use eclipse for that (export -> excecutable jar). That option requires a MainClass, where i am not sure which one to take and which also lets me doubt this approach. Using the other export jar option, does not allow to add the necessary dependencies to my jar.
Anyone other proposals? Since the antlr grammar file is available, it should be (actually) quite easy to generate the parser, but i am not sure how to do this, since this grammar file has a bunch of dependecies. Or if I rephrase this question: how do i deal with this type of antlr grammar files (that have dependecies to java libraries). In typical antlr tutorials, I (as a newb in antlr and xtext) could not find the answer.
best regards
I looked at the grammar in that project. IT is HIGHLY specific to Xtext. (To the point that it’s a bit difficult to find the ANTLR grammar amongst all of the actions).
You might be able to use the ANTLR3 grammar to parse it and discard all of the actions, etc. that make it so tightly coupled to Xtext (being careful about any semantic predicates and dependencies they might have on those actions). Emphasis on the MIGHT here.
In short, it’s not going to be at all simple to generate a parser divorced from Xtext using this grammar.
If you were to elaborate on what you need to accomplish by not just using the Xtext SysMLv2, and feel a need to create a separate parser someone might be able to point you in an appropriate direction.
I am trying to develop a tool that basically analyses an Android app APK and counts the number of calls to a specific API method (e.g., android.app.AlarmManager.set())
1. What approach do you recommend?
So far I have used APKTool and now I have.smali files.
However, for the same java source, I can have multiple files:
ExportAsyncTask$1.smali
ExportAsyncTask$2.smali
ExportAsyncTask$3.smali
ExportAsyncTask$4.smali
2. What do these multiple files mean?
3. The resulting .smali files also include external libraries that I would like to leave out the analysis. How can I do that?
1. What approach do you recommend?
Yes, Apktool can be used for your task. Each java class in the APK will be represented by a .smali file in directories tree representing the packages. Smali - is the Android Virtual Machine language. The language is much simpler than Java and hence easier for analysis. In your case you should search for invoke opcodes and Landroid/app/AlarmManager;->set strings. If you are working in Linux, you can for example grep and count them. In Windows you have text editors like Notepad++ that allow text search in multiple files.
2. What do these multiple files mean?
In Java there are internal classes and implicit internal classes. The former will appear in OutherClass$InnerClass.smali and the later, having no name of its own, get numbers, like OuterClass$1.smali. You sometimes even get more deep levels like a$b$c$1.smali.
3. The resulting .smali files also include external libraries that I would like to leave out the analysis. How can I do that?
You have no precise way to do that. But generally speaking, when you look at the packages/directory tree of many samples you usually grasp the pattern. E.g. usually application code is in com.* package and android.*, org.* and uk.* include libraries. After such inspection you simply exclude those directories from your search.
I need to extract some specific functionality from a large legacy Java codebase, in order to turn it into a standlone command-line application. This code is not documented at all and is not very modular or even clear. So I'm having a really hard time figuring out what I need to keep.
Basically what I need is a a dependency tree, listing all the direct or indirect dependencies of this one *.java file. (Preferably I would like this listing to be in a format that I can save to a text file, as opposed to some un-copy-able whiz-bang GUI tree with a bazillion collapsed nodes...)
I'm using Eclipse for this detective work. I am an Eclipse beginner, but I figure that there may be Eclipse tricks/tools to perform this kind of operation with a bit less effort.
Any suggestions (using Eclipse or otherwise) would be appreciated.
There's a free version of eUML2: http://www.soyatec.com/euml2/features/eDepend/, one of its features is exactly what you need. Also another one, i'm not sure if eUML can export any text files.
Here is a kind of detailed guide installing eUML2.
I've used Dependency Finder for this kind of work recently and it works well.
You can make use of the Java doc generation functionality to be able to generate a java doc that in this case will not contain much information about the methods but will give you an idea of which classes extend which classes, interfaces and such, resulting in a sort of a dependency tree.
Does anyone knows a tool for Java (something like codedom for C#) that provides a way to generate Java code to a .java file?
EDIT:
I'm building a platform the main objective of which is to automate an operation. Giving some input, I want to generate code for an external tool. So it isn't generation on runtime. I want to generate and output that to an actual file.
JET maybe outdated (I didn't use it) JET Tutorial Part 1
More Plugins for Eclipse Plugins in Code Generation
EDIT:
Sorry I don't know codedom and what features this tool implies.
Standalone is Freemarker
and Velocity see also this example
I have had some success using ASM to both modify existing classes at the bytecode level or to generate completely new classes on the fly. The tutorial walks you through this in a very understandable fashion.
ASM like most such tools generates bytecode not source. The reason for this is if you want to dynamically generate and execute new code from with a program, historically it was not straight forward to invoke the Java compiler. Therefore it was generally easier to generate and use bytecode than source.
If you need to generate and run the code immediately within your program I recommend you use bytecode manipulation tool. If all you need is Java source, I would roll my own code generator that takes my input format and generates the code. You may want to look for a framework to help you with this but since a source file is just text, usually it is just as easy to create this yourself especially if you have a custom input format.
ABSE and AtomWeaver form a code generation and model-driven-development framework where you can easily implement what you want. ABSE is a new methodology where you compose your code generator from smaller bits (called Atoms) and AtomWaver is an straightforward IDE that lets you implement, manipulate and use your generator models.
It also allows non-programmers to build programs/configurations/whatever, made from already-built parts (Atoms you have previously prepared).
This project is just being publicly launched now, and an alpha version is being made available now. ABSE is open, and AtomWeaver is free for personal and commercial use.
Get more info here : http://www.abse.info (Disclaimer: I am the project lead)
What you could try is to use an existing grammar (e.g. from ANTLR) and build the AST. Then from the AST generate the code. That should be much more robust than simple templating. For something in the middle I suggest the (eye-opening) talk from Terence Parr about StringTemplate. (Sorry, don't have the link for the talk at hand)
I am not sure what you really need, but take a look at javassist. Is it the thing you are looking for?
I want to parse some data, and I have a BNF grammar to parse it with. Can anyone recommend any grammar compilers capable of generating code that can be used on a mobile device?
Since this is for JavaME, the generated code must be:
Hopefully pretty small
Low dependencies on exotic Java libraries
Not dependant on any runtime jar files.
I have used JFlex before, and I know it satisfies your second and third requirements. But I don't know how big the generated code might be. According to the manual, it generates a packed DFA table by default, so it might not be too bad.
The first question is do you have an existing grammar definition? When I've ported a LALR grammar to Java, I've used JFlex/CUP.
If your starting from scratch, I'd suggest you use JavaCC/FreeCC, which is an LL(k) parser. It's quite well documented and there are not runtime dependencies.