Can I automatically generate code samples from IDE code templates?

Can I automatically generate code samples from IDE code templates? - java

I'm trying to find information (documentation, advice, etc) on how certain IDE templates (e.g. in Eclipse, IntelliJ, and NetBeans) are instantiated internally by IDEs, and I'm having some trouble.
I'm hoping, perhaps optimistically, that I can automatically generate multiple (at least two) distinct samples of each pattern from templates written in the associated grammars.
Every pattern-parameter (including cursors) must be filled, and samples for the same pattern should only have non-pattern-parameter content in common.
At this stage, they need to be syntactically valid so that they can be parsed, but do not need to be fully semantically valid/compilable snippets.
If anyone knows how any of these IDEs work internally, and can tell me if/how I might be able to do this (or can point me towards sufficient documentation), I would greatly appreciate it.
Background/Context
I'm trying to create a research dataset for a pattern mining task - specifically, for mining code templates. I've been looking into it for some time and, as far as I'm aware, there isn't a suitable precedent dataset, so I have to make one.
Rather than painstakingly defining every feature of every pattern myself, I'm writing tools to partially automate the process. Specifically, automating the tasks of deriving candidate patterns from samples, and of filtering out any candidates not observed in the actual corpus. The tools are input-language-agnostic, but I am initially targetting Java ASTs via the Eclipse JDT.
My thinking is that well-established patterns such as idioms and IDE code templates, from sufficiently reputable sources, are rational and intuitive pattern candidates with which I can, at least, evaluate recall. I can, and will, define some target-sample sets manually. However, I would prefer to generate them automatically, so that I can collect more complicated templates en masse (e.g. those published by IDE community members).
Thanks in advance,
Marcos C-S

Related

Java Grammar syntax analyzer (ASCII to graph)

I am developing an assistant to type database commands for DBAs, because these commands have many parameters, and an assistant will help a lot with their job. For this assistant, I need the grammar of the commands, but database vendors (Oracle, DB2) do not provide that information in any format, the only thing is the documentation.
One example of a DB2 command is: http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.cmd.doc/doc/r0001933.html
For this reason, I am trying to analyze the grammar diagram or railroad diagram (http://en.wikipedia.org/wiki/Syntax_diagram), but I have not found anything in Java that could help me. I would like some re-engineering (reverse) tool that takes the ASCII (textual representation) of the grammar, and creates a graph in Java. Then, with the graph in Java, the assistant could propose options of the current typed command.
On example of the assistant http://www.youtube.com/watch?v=5sBoUHJupvs
If you have information about how to analyze grammar diagrams with Java (not generate) I will appreciate that information.

The closest tool I've seen is Grammar Recovery System by Ralf Lammel. It depends on accessibility of railroad diagrams as text strings. That's generally not how they are found. You appear to be lucky in the DB2 case, Ralf's work points in the right direction.
Considering that such diagrams are usally rendered as just a set of pixels (PLSQL's are like this in the PDF files provided for documentation), you have several sets of problems: recognizing graphical entities from pixels, assembling them into actual representations of the railroad diagrams, and then using such as your assistant.
I think this is a long, hard, impractical approach. If you got it to work, you'd discover the diagrams are slightly wrong in many places (read Ralf's paper or find out the hard way), and therefore unusable for a tool that is supposed to produce the "right" stuff to help your DBAs.
Of course, you are objecting to the other long, hard, "impractical" approach of reading the documentation and producing grammars that match, and then validating those grammars against the real world. Yes, this is a tough slog too, but it actually does produce useful results. You need to find vendors that done this and will make it available to you.
ANTLR.org offers a variety of grammars. Have you checked there?
My company offers grammars and tools for processing them. We have done this for PLSQL and SQL2011 but not yet DB2.
Given a grammar, you now need to use it to provide "advice" to your users. Your users aren't going to type in a complete "program"; they want to generate fragments (e.g., SELECT statements). Now you need a parser that will process grammar fragments and at least say "legal" or "not". Most won't do that. Our DMS Software Reengineering Toolkit will do that.
To provide advice, you need to be able to walk the grammar (much as you considered for railroad diagrams) to compute "what is legal next". That's actually pretty hard (an in fact it is roughly equivalent to what an LR/GLR parser generator does when building tables). Our DMS engine does that during syntax error repair by traversing its GLR parse tables (since that work is already encoded in the tables!). That's not easy to do, as it is a peculiar variant of the GLR parsing algorithm. You might do better with an Earley parser, which keeps around all possible parses as a set of choices; you could simply inspect each one.
But this looks like quite a lot of work, and I think you'll be surprised by the amount of machinery you need.
The best work in this area is Harmonia, which produces incremental editors for code. Our DMS engine's parser is based on earlier work done by this project, because we are interested in the incrementality aspect.

You can try using ANTLR http://www.antlr.org/
It will not be able to understand an ASCII representation of the grammar, but it is powerful enough to do anything else you need, if you don't mind spending the time to learn the software.

What is the difference between Acceleo and Xpand?

I have a DSL which is based on a custom metamodel, which in its turn is based on EMF/Ecore. I am trying to figure out which solution to choose, and I cant find any decent comparisons anywhere.
Does anyone have any reasons why I should choose one over the other?
What I know so far is that Acceleo uses a OMG standardized language, but it seems harder to use than Xpand.

First of all, I wonder why you consider Acceleo more difficult to learn than Xpand, while both languages have differences (blocks and delimiters for example) they have quite a similar structure. I won't details all the elements in both languages but, for example, I don't see such a difference between something like:
«FOREACH myAttributes AS a»«a.name»«ENDFOREACH»
and
[for (a: Attribute|myAttributes)][a.name/][/for]
Both are template based languages and as such they have quite the same structure. The main difference between Acceleo and Xpand comes from the fact that Acceleo is based on the standards MOFM2T and OCL from the OMG and the tooling.
I am not very familiar with Xpand tooling but you can find more about it on their wiki. Acceleo on the other side contains an editor with syntax highlighting, code completion, error detection, refactoring and more. It also contains a debugger, a profiler, Ant and Maven support. You can also easily deploy your generators as Eclipse plugin for other users or use them out of Eclipse in a regular Java application. You can find more information on Acceleo here. You can see in videos most of the features of Acceleo on the Obeo Network (registration required).
Finally, the latest activity on xPand as occurred a year ago while Acceleo is actively developed. You can even follow the Acceleo development on github if you want.
Stephane Begaudeau
Disclaimer: I am one of the member of the Acceleo dev' team.

I am a dabbler, not an expert.
My impression is that if you need little more than a templating language, then Xpand is the way to go. Otherwise, pick Acceleo - but as you say, the learning curve is very steep.
When do you need more than a templating language? For me, they seem to run out of gas when the structure (not content) of the output is dependent on multiple independent pieces of the input. If you don't want to get into Acceleo, but have one of these cases, consider inventing an auto-generated "shim" language that gets you partway from input language to output language, perhaps with a lot of redundancy in it to avoid lookups at template-generation time.

I've been using the old 2.x Acceleo on a full scalled project and done some test with the new one.
The langage is pretty easy to use, but with the new version it's a little bit more difficult to bind some
java code to your template when the script langage is not enought.
I was a very big fan of the 2.x, but with the 3.x, I add lots of troubles to make it work. You have to write java code to handle eclipse resources for instance. I totaly gave up when updating to juno, my acceleo projects didn't worked anymore and I didn't manage to correct it in two days. I hope they will make it easier to use out of the box.

Basically the main difference is that ACCELEO is an implementation of the MOF Models To Text Transformation Language which is the OMG (Object Management Group) Standard for the definition of Models to Text transformation. It is therefore a standard language designed by the same group ho designed MOF, UML, SysML and MDA in general. XPAnd is a language which I guess existed before the standard but it is now different from it.
If you start from scratch then start with Acceleo.

In my case, I use a custom meta-model (derived from UML2) with custom stereotypes and stereotypes properties). I tried both Acceleo and Xpand template languages. Indeed they are pretty similar in term of structure and capabilities.
However, I can see one big difference (which makes Xpand much better in this use case): you can use your custom stereotypes in your Xpand templates.
Xpand engine brilliantly chooses the "best matching template/rule" for every stereotype (taking into account inheritance between stereotypes as well).
Furthermore, it is very easy to obtain stereotype properties.
These two "features" make the templates very elegant, compact and readable.
For example:
«DEFINE myTemplate FOR MyUmlProfile::MyStereoType»
MyValue: «this.myStereotypeProperty» or simply: «myStereotypeProperty»
«ENDDEFINE»
In Acceleo, I found it clumsy to achieve the same (longer statements, more code) and my templates ended up lengthy and complex. The positive thing about Acceleo, however, was that it worked conveniently from IBM RSA (applied directly to RSA (emx) models). It has code highlighting and auto-complete working nicely.
Xpand only worked if I exported my RSA models to ".uml" (~XML) format. It doesn't offer code highlighting or auto-complete (or at least I didn't figure out how).
Considering all pros and cons, I still vote for Xpand (in my use case).

Developing a game with complex (and undefined) rules?

How would you develop a game that could end up with complex rules but you need to work on the rules (add, tweak, balance) a lot? I have looked at rule-based languages but I haven't found quite useful enough information regarding this.
UI etc. will be developed later, first I'd need to iteratively develop the rules and formulas and test them out between iterations. The game in question would be a tactical two-player game where players select "troops" and a large part of the game is choosing the correct troop setup. So the rules could be something like
If attacker's skill A is greater than defenders skill B and defender does not have extra skill Z then ...
That's obviously a very simple rule, I expect there to be dozens, if not hundreds of rules, with paths (if A then if B....).
For testing I would write a test framework that can run the rule sets through with large number of iterations and logging that allows me to see how the latest changes affected balance. It would also be useful to be able to define acceptable values and a possibility to see easily the changes. What tools are there for this?
The language of choice is either Python or Java (depending on whether I want to target Android or not - probably I will).

Whatever you'll do, in this sort of program you'll end up implementing half of Prolog anyway. Check out the forward-chaining, backward-chaining, and backtracking algorithms.
A pure-Java version of Prolog called Jekejeke was recently released. I can't comment on its quality.
Python makes it much easier than Java to implement backtracking using generators and yield statements.

As with larsmans I'm going to suggest that you use Prolog for your rules development. It's just the best language in semi-common use for doing exactly the kind of thing you want to do. I will, however, instead recommand tuProlog as looking like the ideal environment for your needs given that you want to write the game in Java. TuProlog is intended as an embeddable Prolog environment (can be embedded in Java or in .NET) with very finely-tuned library inclusion so you don't have to carry around a whole, bloated Prolog world with you when you're using just small parts of it.
Here's the blurb from the web site:
tuProlog is a light-weight Prolog system for distributed applications and infrastructures, intentionally designed around a minimal core (containing only the most essential properties of a Prolog engine), to be later configured by (statically and dynamically) loading/unloading libraries of predicates. tuProlog also natively supports multi-paradigm programming, providing a clean, seamless integration model between Prolog and mainstream object-oriented languages -- namely Java, for tuProlog Java version, and any .NET-based language (C#, F#..), for tuProlog .NET version. It is also easily deployable, just requiring the presence of a Java/CLR virtual machine and an invocation upon a single self-contained archive file. Interoperability is further developed along the two main lines of Internet standard patterns and coordination models.

It's an interesting question, though a bit broad and generic. I'll try to answer it according to my interpretation of what you're asking ;)
Your main concern seems to be about being able to express the rules in a concise, readable way that would allow you to always keep the definition clear and change it rapidly. Since you say the rules are probably going to get complex, I believe your best bet would be to write a DSL for them.
Try to writing a grammar that would be sufficient for describing your rules and then see how to plug it into your game. ANTLR could be very helpful there, especially because it supports both Java and Python.

try The A.I. of F.E.A.R.. there FSM (finite state machine) was combined with A* pathfinding. instead of finding a path in a terrain the engine found chains of goals to implement generation of "intelligent" behaviour of agents on the fly. maybe there is something inspiring in there for you.

Intelligent search and generation of Java code, preferrably using Python?

Basically, I do lots of one-off code generation, large-scale refactorings, etc. etc. in Java.
My tool language of choice is Python, but I'll take whatever solutions you can offer.
Here is a simplified illustration of what I would like, in a pseudocode
Generating an implementation for an interface
search within my project:
for each Interface as iName:
write class(name=iName+"Impl", implements=iName)
search within the body of iName:
for each Method as mName:
write method(name=mName, body="// TODO implement this...")
Basically, the tool I'm searching for would allow me to:
parse files according to their Java structure ("search for interfaces")
search for words contextualized by language elements and types ("variables of type SomeClass", "doStuff() method calls on SomeClass instances")
to run searches with structural context ("within the body of the current result")
easily replace or generate code (with helpers to generate, as above, or functions for replacing, "rename the interface to Foo", "insert the line Blah.Blah()", etc.)
The point is, I don't want to spend a lot of time writing these things, as they are usually throwaway. But sometimes I need something just a little smarter than what grep offers. It wouldn't be too hard to write up a simplistic version of this, but if I'm going to use something like this at all, I'd expect it to be robust.
Any suggestions of a tool/library that will help me accomplish this?
Edit to add some clarification
Python is definitely not necessary; I'll take whatever is that. I merely suggest it incase there are choices.
This is to be used in combination with IDE refactoring; sometimes it just doesn't do everything I want.
In instances where I'm using for code generation (as above), it's for augmenting the output of other code generators. e.g. a library we use outputs a tonne of interfaces, and we need to make standard implementations of each one to mesh it to our codebase.

First, I am not aware of any tool or libraries implemented in Python that specifically designed for refactoring Java code, and a Google search did not give me any leads.
Second, I would posit that writing such a decent tool or library for refactoring Java in Python would be a large task. You would have to implement a Java compiler front-end (lexer/parser, AST builder and type analyser) in Python, then figure out how to integrate this with a program editor. I'm not surprised that nobody has done this ... given that mature alternatives already exist.
Thirdly, doing refactoring without a full analysis of the source code (but uses pattern matching for example) will be incapable of doing complex refactoring, and will is likely to make mistakes in edge cases that the implementor did not think of. I expect that is the level at which the OP is currently operating ...
Given that bleak outlook, what are the alternatives:
One alternative is to use one of the existing Java IDEs (e.g. NetBeans, Eclipse, IDEA. etc) as a refactoring tool. The OP won't be able to extend the capabilities of such a tool in Python code, but the chances are that he won't really need to. I expect that at least one of these IDEs does 95% of what he needs, and (if he is realistic) that should be good enough. Especially when you consider that IDEs have lots of incidental features that help make refactoring easier; e.g. structured editing, undo/redo, incremental compilation, intelligent code completion, intelligent searching, type and call hierarchy views, and so on.
(Aside ... if existing IDEs are not good enough (#WizardOfOdds - only the OP can make that call!!), it would make more sense to try to extend the refactoring capability of an existing IDE than start again in a different implementation language.)
Depending on what he is actually doing, model-driven code generation may be another alternative. For instance, if the refactoring is happening because he is frequently creating and recreating his object model(s), then an alternative is to code the models in some modeling language and generate his code from those models. My tool of choice when doing this kind of thing is Eclipse EMF and related technologies. The EMF technologies include generation of editors, XML serialization, persistence, queries, model to model transformation and so on. I have used EMF to implement and roll out projects with object models consisting of 50 to 100 distinct classes with complex relationships and validation requirements. EMF's support for merging source code edits when you regenerate from an updated model is a key feature.

If you are coding in Java, I strongly recommend that you use NetBeans IDE. It has this kind of refactoring support builtin. Eclipse also supports this kind of thing (although I prefer NetBeans). Both projects are open source, so if you want to see how they perform this refactoring, you can look at their source code.

Java has its fair share of criticism these days but in the area of tooling - it isn't justified.
We are spoiled for choice; Eclipse, Netbeans, Intellij are the big three IDEs. All of them offer excellent levels of searching and Refactoring. Eclipse has the edge on Netbeans I think and Intellij is often ahead of Eclipse
You can also use static analysis tools such as FindBugs, CheckTyle etc to find issues - i.e. excessively long methods and classes, overly complex code.
If you really want to leverage your Python skills - take a look at Jython. Its a Python interpreter written in Java.

Code understanding, reverse engineering, best concepts and tools. Java

One of most demanding tasks for any programmer, architect is understanding other's code.
For example, I am contractor, hired to rescue some project very quickly. Fix bugs, plan global refactoring and therefore I need most efficient way to understand the code. What is the list of concepts, their priority and best tools for this?
Of what I know: reverse code engineering to create object models (creating of diagram per package is not so convenient), create sequence diagrams (the tool connects in debug mode to the system and generates diagrams from runtime). Some visualizing techniques, using some tools to work not just with .java but also with e.g. JPA implementors like Hibernate. Generate diagram for not all the codebase, but add some class and then classes used by it.
Is Sparx Enterprise Architect state of the art in reverse engineering or far from that? Any other better tools? Ideally would be that tool makes me understand the code as if I wrote it myself :)

The book Object-Oriented Reengineering Patterns deals with this in detail. Unfortunately there is no silver bullet attached :-)
However, it lists a lot of useful techniques for taking over legacy code. In brief
interview at least some of the original developers (if they are still around) about
development history: phases, releases
current state of affairs
team social structure, politics, dynamics: when and why did people join and leave
bugs: typical, easiest, hardest
code quality: cleanest / ugliest parts
configuration data: form, content and usage
unit / integration / manual / ... test cases and data
SCM branch structure and usage
documentation: what is documented where, is it up to date
contact persons for external interfaces
Watch developers / users during demo to find
main features
typical use cases
usage anecdotes
good / bad, missing / superfluous functionality
"read all the code in one hour"
get high level view of class hierarchies, interfaces
take multiple sessions if needed
identify large structures (these often contain important functionality)
look for design patterns
check comments (they can reveal a lot, but may be also misleading)
skim documentation (if there is any)
just record the availability of specific types of docs e.g. specification, UML diagram, Wiki, Javadoc etc.
is it useful and why (not)
is it up to date

By far the most important tools are your ears, your tongue and your larynx. Ask the people who are familiar with the code - they'll be able to help you understand its general architecture much better than any software tools.
Automatically reverse-engineered complete UML models are generally nearly useless because they cannot distinguish between important abstractions and implementation details - which is the whole point of such models.
Software tools are more useful to answer very specific questions when you are investigating details, such as "where is this method called from?" or "what classes implement this interface" - any good IDE will be able to do that. Debuggers can help too - placing breakpoints at keypoints of the code and looking at the call stack when they're hit is often very enlightening.

Just to elaborate on Michaels mentioning of good IDE's which can help you:
I use the following Eclipse facilities a lot:
Shift-F2 when the cursor is placed in an identifier brings up the Javadoc for that identifier, if any. Good for navigating.
Hovering the mouse over an identifier brings up a box with the Javadoc in it, if any. Good for reminding when writing e.g. a method call.
The Declaration view shows the source where the keyword where the cursor is placed, is defined. This is updated when the cursor moved.
F3 goes to the definition of the current identifier.
Ctrl-T on an identifier shows all subclasses and implementations in a popup. Very useful when working on interfaces.
F4 on an identifier brings up the implementation hierarchy of that identifier in a panel, which can be navigated. Very useful to learn how things are connected. This includes both classes and interfaces.

EclipseUML Omondo is the best Java reverse engineering tool. It reverse all the java code, all packages and even class interaction with interface if not in the same package. Just amazing.
You can also reverse:
- .class
- hibernate annotations
- JPA annotations
What I like with this tool is that my code is clean because all the model information is saved into an xmi format and not as tag in my code. You can also create small documentation inside each existing package using diagrams as a view of the model. Just marvelous and respecting the official uml 2.2 specification.
The only problem is that it is really too expensive so the price is a stop for me !!

Doesn't extract high level architectures, but does make it much easier to climb around your Java code: our Java Source Code Browser. This reads source code (and supporting class files) and produces Javadoc style documentation plus source text bi-directionally hyperlinked to the Javadoc information.
(I'm one of the principals behind it).

I use Enterprise Architect for whole UML (including reverse engineering with Java) and it works perfectly.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.