Trying to find a way to count the Number of Java Classes, and methods that either have comments or need comments. We are trying to document all our code but its gonna take a while and we would like to post metrics on how far along we are. We are using Doxygen to convert Javadoc to webpages. I haven't found a way yet with Doxygen but doesn't mean its not there.
Turn on all the warnings in Doxygen, and then get a line count of the log file.
I've used QDox in the past for source code analysis. It parses java sources into a nice model for gathering statistics / automated code generation etc.
Related
As I was looking through the Java source code, I found some unusual files, mostly related to ByteBuffers in the java.nio package which had a very messy source code and were labelled This file was mechanically generated: Do not edit!.
These files also contained large portions of blank lines (some even in the middle of javadocs (!!?)), presumably to prevent the line numbers from changing. I have also seen a few java decompilers, such as procyon-decompiler, which have an option to keep line numbers, but I doubt that's the case, because putting blank lines before the final accolade changes nothing.
Here are a few of these files (I couldn't find any links to them online and didn't pastebin them because I don't want to break any copyright, but you can find them in the src.zip folder at the root of your JDK installation folder):
java.nio.ByteBuffer
java.nio.DirectByteBufferR
java.nio.Bits
java.nio.BufferOverflowException
I'd be curious to know:
Which tool generated these files?
Why does the tool keep the line numbers the same? Is it to make debugging (stacktraces) easier?
Why would a tool be used to generate them, while all other classes are programmed by humans?
Why would the tool put blank lines randomly inside parentheses, before the final accolade, or even in javadocs?
I can probably not answer all of the questions, but some background is:
In the Makefile at http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/9b8c96f96a0f/make/java/nio/Makefile, they are generating different java source files from the same template file through some preprocessor:
...
$(BUF_GEN)/CharBuffer.java: $(X_BUF_TEMPLATE) $(GEN_BUFFER_SH)
$(prep-target)
#$(RM) $#.temp
TYPE=char SRC=$< DST=$#.temp $(GEN_BUFFER_CMD)
$(MV) $#.temp $#
$(BUF_GEN)/ShortBuffer.java: $(X_BUF_TEMPLATE) $(GEN_BUFFER_SH)
$(prep-target)
#$(RM) $#.temp
TYPE=short SRC=$< DST=$#.temp $(GEN_BUFFER_CMD)
$(MV) $#.temp $#
...
$(X_BUF_TEMPLATE) refers to X-Buffer.java.template, which is the source for typed buffers like CharBuffer, ShortBuffer and some more.
Note: The URLs might change in the future. Also sorry for referring to Java 7 - in Java 8 they have modified the build system, I did not find the corresponding Makefiles so far.
Which tool generated these files?
GEN_BUFFER_SH / GEN_BUFFER_CMD finally refers to genBuffer.sh, so the script which creates these files is http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/9b8c96f96a0f/make/java/nio/genBuffer.sh.
Why would a tool be used to generate them, while all other classes are programmed by humans?
I don't have an authoritative answer for this specific case, but usually you are using code generation tools
if you need to create a lot of similar classes/methods which only differ in some detail, but which is subtle enough so that you can not use established mechanisms like generics or method parameters (probably the case here, since the buffers are generated for primitive types which can not be used with Generics)
if you need to create complex algorithms from a much simpler representation (like generating parsers from a grammar).
Why does the tool keep the line numbers the same? Is it to make debugging (stacktraces) easier?
I am guessing: yes, its to retain the line numbers in stack traces so that they match the template files. Other tools like the C preprocessor work similar.
I am looking for ways to remove all the annotations from existing Java Source Code. I am looking for an ant task or any other approach. I have seen some solutions that do this at the class level, but I am looking to do this at the source code to source code level.
I have done this through Java Parser code available in Lombok.
Look at these methods which has the logic
lombok.javac.handlers.JavacHandlerUtil#deleteAnnotationIfNecessary
lombok.javac.handlers.JavacHandlerUtil#deleteImportFromCompilationUnit
I ended up using JEdit which has brilliant regular expression support.
I wanted to replace specific annotations (I wanted to keep stuff like #Override). You can easily do that for all buffers or a directory tree.
Just write some simple expressions for the annotations you want to remove. For example
^\s*#NamedQueries\(\n\{[^\}]+\}\)\n
I'm working on a simple parser to transform java-interfaces and value objects to C#. This is done, so a C# client to communicate with the java JMS server can be created automatically.
My parser is almost finished, I can read generic-informatins, reuse C# types, and even merger getter and setter methods to properties. The only thing i can't, because it's not possible to be done with reflections, is to read the parameter names of methods in an interfaces. I found a library (BCEL) and can read the parameter names of "real" methods, in classes, but not within an interfaces.
So my idea was, eitherway it would be cool to have the former java comments also transfered into .net, so i could use it and i could use the very same tool to get the parameter names, since they can also read them.
So my question, do you know of any library which i could use for this? I have the generated javadocs and also the sourcecode which i could use as a source for the tool.
Thank you very much
cheers
zahorak
If you have access to the source code, the easiest way would be to use a custom Javadoc doclet. This gets access to all the declarations (including parameter names), and also all comments. You can then convert it in any format you want.
If you only have the Javadoc output, I suppose most IDEs have some way of parsing it. Have a look at Eclipse or Netbeans, maybe their Javadoc parsing code is extractable.
Is there a simple to use Java library that can take a String and return a set of Strings which are the keywords/keyphrases.
It doesn't have to be particularly clever, just use stop words and stemming to match keywords.
I am looking at the KEA package http://code.google.com/p/kea-algorithm/ but I can't figure out how to use their code.
Ideally something simple which has a little example documentation would be good. In the meantime I will set about writing this myself!
EDIT: When I say I can't see how to figure out how to use their code, I mean I can't see a simple way. The individiual classes by themselves have useful methods that will do much of the work.
This is a fairly old question and probably the OP has already solved his problem, but putting it here for others who may stumble upon the question looking for how to use KEA.
For KEA, you will need a training set - some of your documents will need to have keywords already set. The training data consists of a directory of documents (.txt files) and corresponding keywords files (.key files), with one keyword per line. You train KEA on this set, then use the model to extract keywords on the rest of your documents, which are in another directory of .txt files. KEA will write out corresponding .key files in this directory.
For more information, take a look at one or more of the following:
1) The KEA source distribution has a TestKEA.java class which shows how to extract keywords from a small test corpus. The README has details on the directory format required.
2) This blog post has (a somewhat terse IMO) instructions on how to use KEA.
http://kea-pranay.blogspot.com/2010/02/kea-key-extraction-algorithm.html
3) My blog post which I wrote up last weekend while trying to learn how to generate keywords from a corpus I had (which were already manually annotated with keywords). It has Python code to pre-process data to the way KEA expects it, Scala (KEA provides a Java API) code to train and run the extractor, and Python code to do analyze and visualize the generated keywords.
http://sujitpal.blogspot.com/2014/08/keyword-extraction-with-kea.html
You might try the Porter Stemming algorithm: the java version is at http://tartarus.org/~martin/PorterStemmer/java.txt and the main page is at http://tartarus.org/~martin/PorterStemmer/. Its old, but doesn't do a bad job.
I am new to Java and I came across a statement in a Java project which says:
Digester digester = DigesterLoader.createDigester(getClass()
.getClassLoader().getResource("rules.xml"));
rules.xml file contains various patterns and every pattern has different attributes like classname, methodname and some another properties.
i googled about digester but couldn't found anything useful that could help me with the statement above. can anyone just tell me what are the steps followed in executing above statement ? In fact what is the advantage of this XML stuff ?
swapnil, as a user of Digester back in my Struts days I can honestly say it's tricky to learn/debug. It's a tough library to familiarize yourself with, essentially you are setting up event handlers for certain elements kinda like a SAX parser (in fact it's using SAX in behind the scenes). So you feed a rules engine some XPath for nodes you are interested in and setup rules which will instantiate, and set properties on some POJOs with data it finds in the XML file.
Great idea, and once you get used to it it's good, however if you have an xsd for your input xml file I'd sooner recommend you use JAXB.
The one thing that is nice about Digester is it will only do things with elements you are interested in, so memory footprint ends up being nice and low.
This is the method that's getting called here. Xml is used commonly in Java for configurations, since xml files do not need to be compiled. Having the same configuration in a java file would mean you have to compile the file.
I assume you understand how the rules file is being loaded using the class loader? It's basically looking in the same package as the class itself and creating a URL that gives the file's absolute location.
As for the Digester, I've not used it but a quick read of this (http://commons.apache.org/digester/) should explain all.
They used it at my last gig and all I remember is that it was extremely slow.