Parsing .java file to extract all features - java

I need to parse .java file to do static analyze and extract information from the file like:
variables
methods
annotations
inner classes
...
I need to parse It at runtime something like this:
JavaClass c = parse("file.java");
c.getMethods();
I am not sure if there is any tool already exists, and if not can you please provide with some advice how to build it.

You definitely need to use a good parser for that.
ANTLR is a parser generator that comes with a huge library of ready to use grammars, one of which is Java. With that one it is easy to transform a java source file into an abstract syntax tree. On that tree do whatever analysis or transformation you need.

Have a look to the JavaDoc API. It provides everything you need, and you do not even need to write a parser of your own.
Ok, in the beginning it is far from being intuitive, but the good thing is, it works along the same line as an AnnotationProcessor, so you learn two things in one go.

Is your task an academic one? If not then I would consider supporting/joining an existing static analysis tool project like spotbugs, PMD, ...
And does it really need to be .java files or can it also be .class files? Most available static analysis tools for java work on bytecode. So why reinvent the wheel?
Spotbugs for instance is using BCEL and ASM.

Related

Are there any Java Class Library "header files" containing all method descriptors in the standard library?

In order to create a valid .class file, every method has to have a full internal name and type descriptors associated with it. When procedurally creating these, is there some sort of lookup table one can use (outside of Java, where a ClassLoader can be used) to get these type descriptors from a method name? For example, how would one go from Scanner.hasNextByte to boolean java.util.Scanner.hasNextByte(int) / boolean java.util.Scanner.hasNextByte() (or even from java.util.Scanner.hasNextByte to boolean java.util.Scanner.hasNextByte(int) / boolean java.util.Scanner.hasNextByte())? The above example has overloading in it, which is another problem a human- but mostly computer-readable declarations file would hopefully address.
I've found many sources of human-readable documentation like https://docs.oracle.com/javase/8/docs/api/index.html containing uses of each method, hyperlinks to other places, etc. but never a simple text file or collection of files containing just declarations in any format. If there's no such file(s) don't worry about it, I can try and scrape some annoying HTML files, but if there is it would save a lot of time. Thanks!
The short answer is No.
There isn't a "header file" containing the class and method signatures for the Java class libraries. The Java tool chain has no need for such a thing. Nor do 3rd-party Java compilers, or compilers for other languages that rely on the Java SE class libraries.
AFAIK, there isn't a 3rd-party tool that builds such a file or an equivalent database or in-memory data structures.
You could create one though.
You could chose an existing Java parsing library, and use it to build parse trees for all of the source files in the class library, and emit the information that you need.
You could potentially create a custom Javadoc "doclet" plugin to emit the information.
Having said that, I don't understand why you would need such a mapping. Surely, your IDE does this already ... and exposes the information via some internal API. And if this is not for an IDE plugin, what it is for?
You commented:
I'm making a compiler for a JVM-based programming language ....
Ah ... so your compiler should do what other compilers do. Get the information from the ".class" file. You can either load the class using a standard or custom class loader, or you can use a library like asm or bcel or javassist ... which can read a ".class" file without loading it.
(I haven't checked, but I think the standard javac compiler uses an internal API to do this.)
Note that your proposed approaches won't work for interfacing with 3rd-party Java libraries where the source code is not available and/or the javadoc is not scrapable.
What about building it from the source files for the standard library?
The Oracle Java 8 API web pages you referenced was created by Javadoc processing of source files for the Java standard library.
If you use an IDE with a debugger, there is a good chance you already have much of the standard library source code downloaded. After all, if you set a break point, and then follow the program step-by-step with "Step into", you can trace the execution of the program into standard library methods. The source files would be part of the JDK.
However, some parts of the standard library source might not be available, due to licensing restrictions.

Manipulating Java classes with Java

I would like to manipulate Java classes (with java extension not .class) so that I could :
Delete all methods of a class (keeping the constructor)
Add unimplemented methods
Remove unused imports
...
Is there an API that could accomplish this ?
What I've done so far is trying to manipulate the .java files like text files (with regex,FileUtils, etc.).
Regards.
I
You could look at using the AST (Abstract Syntax Tree) tools from the Eclipse JDT project.
There is a tutorial to get you started at Vogella: Eclipse JDT - Abstract Syntax Tree (AST) and the Java Model - Tutorial
If you only want to temporarily modify the classes (i.e. within the scope of the jvm) then you could do this with reflection:
What is reflection and why is it useful?
If you're taking about permanently altering/creating source code then this is maybe best done using an IDE. Most IDE will tell you about unimplemented methods and provide auto completion to create them. They will also format the source code, remove unused imports etc.
You can use a regular expression, the question then is then what regular expression (And what other options are there!)
Regular expressions maybe aren't ideally suited to this, and for example, when it comes to another task they're not ideally suited to, such as parsing XML, people say don't do it, use an XML parser, but in this case, if you find that there is an absence of a tool built for parsing java source code, then regular expressions may be the best option.
Yes, you can use java reflection api. Please check here
Later edit: To update the class structure you can use javassist. Here you have an example.

Remove all annotations in Java source code and get new source code

I am looking for ways to remove all the annotations from existing Java Source Code. I am looking for an ant task or any other approach. I have seen some solutions that do this at the class level, but I am looking to do this at the source code to source code level.
I have done this through Java Parser code available in Lombok.
Look at these methods which has the logic
lombok.javac.handlers.JavacHandlerUtil#deleteAnnotationIfNecessary
lombok.javac.handlers.JavacHandlerUtil#deleteImportFromCompilationUnit
I ended up using JEdit which has brilliant regular expression support.
I wanted to replace specific annotations (I wanted to keep stuff like #Override). You can easily do that for all buffers or a directory tree.
Just write some simple expressions for the annotations you want to remove. For example
^\s*#NamedQueries\(\n\{[^\}]+\}\)\n

Modify Java sourcecode programmatically with Java or Groovy

To automate certain manual tasks in an legacy project, I need to modify existing java files from within java or groovy code.
I donĀ“t want to use RegEx, because it would be neither quick nor clean in my opinion.
I found javassist and srcgen4javassist. The first one lets me modify my sources as I wish, but only writes bytecode, loosing all comments and annotations. And with the second one I didnt manage to read an existing Class not created with srcgen4javassist itself.
Is there an elegant solution, or do i need to bite the bullet and use Regex?
you could really parse the code using something like eclipse's ASTParser at which point you coudl locate your replacement targets xpath-style, but its a lot of work.
you could also consider marking replacement areas with annotation and writing an annotation processor to generate/alter sources at runtime, but (at least in my opinion) the API is cumbersome.
you can combine regexp with some marker in the source code, something like
//START REPLACEMENT-TARGET
...code to be edited/replaced
//END REPLACEMENT TARGET
which would make your regexp targeting a lot safer.

how to obtain list of fields in class in java?

I want to retrieve list of member variables of a specified class along with other information like datatype, size, value,etc. This is possible using Reflection class. But is there any way other than Reflection class to get this information?
Thanks in advance.
The only other way I'm aware of is via source-code analysis, with tools like Spoon.
Yes introspection may help you apart from Reflection
Just use the methods provided by the field class of your class. See object Class.
reflection, this is actually easiest way to do that
parsing source code using generated compiler (antlr project has java grammar file), it's a little bit more complicated and will require additional dependencies in your project, this is suitable only in case you have source code
reading java class file and analyzing it, the most complicated. you'll have to create a java bytecode parser to read binary file. But this could be the fastest way (no additional deps LALR-k parsing, no overhead like in reflection), you'll be in control what to read, how to read, could work with compiled java code.
The question is why do you think reflection is not suitable for you?
It made much faster in java 1.5 comparing to previous java releases.
The org.springframework.util.ReflectionUtils class is actually quite the helper in these cases.
Apache commons-lang package has a very useful tool: ReflectionToStringBuilder. Here is the link to javadoc: http://commons.apache.org/lang/api-2.4/org/apache/commons/lang/builder/ReflectionToStringBuilder.html

Categories

Resources