AST builders for Java? - java

I am building an AST tree by hand to use with my application. I currently have a lot of data in my program's memory using a standard OO approach, that will use to form an AST.
I was wondering if by chance there are already any frameworks / code generators that could help me with this task.
I am not looking for a compiler compiler. I don't want to define a grammar and have the code generator generate me a parser for it. I intend to instantiate the nodes of the tree by myself, I am only looking for a faster and cheaper way to build the .java files themselves (a plus would be having options for the node's attributes, optional beginVisit() / endVisit() methods, etc).

I would highly recommend that you take a look at Eclipse's Java Development Tools. It includes a very robust AST framework.
My understanding is that with using this API, you would have access to all attributes of the various types of AST Nodes and you can also create visitors with beginVisit() and endVisit() methods.

This seems to be the answer to the question:
http://www.jetbrains.com/mps/
The major goal of MPS is to allow extending languages. This is because
every existing language already has a strict language syntax defined,
which limits its flexibility.
The problem in extending language syntax is mainly the textual
presentation of code. This is especially true if we want to use
different language extensions, where each one may have its own syntax.
This naturally leads to the idea of non-textual presentation of
program code. A major benefit of this approach is that it eliminates
the need for code parsing. Our solution is to have code always
maintained in an Abstract Syntax Tree (AST), which consists of nodes
with properties, children and references, and fully describes the
program code.
At the same time, MPS offers an efficient way to keep writing code in
a text-like manner.
In creating a language, you define the rules for code editing and
rendering. You can also specify the language type-system and
constraints. This allows MPS to verify program code on the fly, and
thus makes programming with the new language easy and less
error-prone.
MPS uses a generative approach. You can also define generators for
their language to transform code in the custom language to compilable
code in some conventional language. Currently, MPS is particularly
good for, but is not limited to, generating Java code. You can also
generate XML, HTML, JavaScript, and more.

Related

Complete metaprogramming framework for Java?

I'm interested in metaprogramming (i.e. programs that help programmers do tedious programming tasks). I'm looking for a tool which has the following properties:
usable both at compile time and runtime;
inspects program structure;
can add new classes, methods or fields and make them visible to Java compiler;
can change behavior of methods;
Java-based (well, Java is most popular programming language according to some rankings);
good integration with IDEs and build tools like Ant, Gradle or Maven;
actively maintained project;
easy to use and extend;
There are some solutions for this, like:
reflection
AspectJ
Annotation Processing Tool
bytecode manipulation (CGLIB, Javassist, java.lang.instrument)
Eclipse JDT
Project Lombok
Groovy, JRuby, Scala
But unfortunately none of them meets all the criteria above. Is there any complete metaprogramming solution for Java?
There's JackPot, which is Java based but I don't think gets a lot a current attention. Has ASTs and symbol tables AFAIK. You can probably extend it; I doubt anybody will stop (or help) you.
There's the Java-based compiler APIs for the Sun, er, Oracle java compiler. They're likely actively maintained, but I don't think you can modify source code and regenerate it. Certainly has symbol tables; dunno about trees. Probably pretty hard to extend; you have to keep up with the compiler guys, not the other way round.
There is ANTLR, which has a Java implementation and a Java parser that will build ASTs. I don't think it has full symbol tables, so doing serious code analysis/revision is likely to be hard. ANTLR is certainly actively maintained, and nobody will object to you enhancing the Java grammar with symbol tables. Just know that will take you about 6 months for Java 1.6 if that's all you do. (That's how long it took our internal [smart] guy to do it for DMS, starting with symbol table support for 1.4).
Not in Java, and not easily integrated into IDEs, but capable of carrying massive analysis and transformation on Java code is our DMS Software Reengineering Toolkit with its Java Front End.
DMS is generic compiler machinery: parsing, AST building, symbol table machinery, flow analysis machinery, with that additional bonuses of source-to-source transformations and generic prettyprinting of ASTs back to legal text including retention of comments. It offers a set of APIs supporting these services, and additional tools for defining grammars and langauge-dependent flow analyzers.
The Java Front End gives crucial detail (using those APIs) to DMS to allow it process Java: a grammar/parser, full symbol table construction for Java 1.4-1.6 (with 1.7 due momentarily), as well as some control and data flow analysis (to be extended over time because this stuff is so useful).
By using the services provided by DMS and the Java Front end, one can reasonably contemplate building arbitrary Java anlaysis and transformation tools. (This makes the tool a "complete" metaprogramming tool, in that it can inspect any language structure, or change any language structure, as opposed to say template metaprogramming or reflection). We believe this to be much more effective than ad hoc tools because you don't have to build the infrastructure, the infrastructure provided is robust and handles cases you don't have the energy to implement, and it is designed to support such tasks. YMMV.
DMS/Java Front end have been used to construct a variety of Java tools: test coverage, profilers, dead code elimination, clone detection on scale, JavaDoc with hyperlinked source-code, fast XML parser/generators, etc.
Yes, its actively maintained; undergoing continuous enhancement since the first version in 1998.
There's a Java metaprogramming framework that is part of Tapestry IOC, it's called Plastic. It munges class bytecodes using custom classloaders, I haven't tried it yet but it looks like it gives a simple interface that still enables the programmer to make powerful metaprogramming changes.
Check out the Meta Programming System:
http://www.jetbrains.com/mps/
It has great IDE support and is used quite frequently by the smart folks at JetBrains.
Check out Spring Roo.

Developing a game with complex (and undefined) rules?

How would you develop a game that could end up with complex rules but you need to work on the rules (add, tweak, balance) a lot? I have looked at rule-based languages but I haven't found quite useful enough information regarding this.
UI etc. will be developed later, first I'd need to iteratively develop the rules and formulas and test them out between iterations. The game in question would be a tactical two-player game where players select "troops" and a large part of the game is choosing the correct troop setup. So the rules could be something like
If attacker's skill A is greater than defenders skill B and defender does not have extra skill Z then ...
That's obviously a very simple rule, I expect there to be dozens, if not hundreds of rules, with paths (if A then if B....).
For testing I would write a test framework that can run the rule sets through with large number of iterations and logging that allows me to see how the latest changes affected balance. It would also be useful to be able to define acceptable values and a possibility to see easily the changes. What tools are there for this?
The language of choice is either Python or Java (depending on whether I want to target Android or not - probably I will).
Whatever you'll do, in this sort of program you'll end up implementing half of Prolog anyway. Check out the forward-chaining, backward-chaining, and backtracking algorithms.
A pure-Java version of Prolog called Jekejeke was recently released. I can't comment on its quality.
Python makes it much easier than Java to implement backtracking using generators and yield statements.
As with larsmans I'm going to suggest that you use Prolog for your rules development. It's just the best language in semi-common use for doing exactly the kind of thing you want to do. I will, however, instead recommand tuProlog as looking like the ideal environment for your needs given that you want to write the game in Java. TuProlog is intended as an embeddable Prolog environment (can be embedded in Java or in .NET) with very finely-tuned library inclusion so you don't have to carry around a whole, bloated Prolog world with you when you're using just small parts of it.
Here's the blurb from the web site:
tuProlog is a light-weight Prolog system for distributed applications and infrastructures, intentionally designed around a minimal core (containing only the most essential properties of a Prolog engine), to be later configured by (statically and dynamically) loading/unloading libraries of predicates. tuProlog also natively supports multi-paradigm programming, providing a clean, seamless integration model between Prolog and mainstream object-oriented languages -- namely Java, for tuProlog Java version, and any .NET-based language (C#, F#..), for tuProlog .NET version. It is also easily deployable, just requiring the presence of a Java/CLR virtual machine and an invocation upon a single self-contained archive file. Interoperability is further developed along the two main lines of Internet standard patterns and coordination models.
It's an interesting question, though a bit broad and generic. I'll try to answer it according to my interpretation of what you're asking ;)
Your main concern seems to be about being able to express the rules in a concise, readable way that would allow you to always keep the definition clear and change it rapidly. Since you say the rules are probably going to get complex, I believe your best bet would be to write a DSL for them.
Try to writing a grammar that would be sufficient for describing your rules and then see how to plug it into your game. ANTLR could be very helpful there, especially because it supports both Java and Python.
try The A.I. of F.E.A.R.. there FSM (finite state machine) was combined with A* pathfinding. instead of finding a path in a terrain the engine found chains of goals to implement generation of "intelligent" behaviour of agents on the fly. maybe there is something inspiring in there for you.

Intelligent search and generation of Java code, preferrably using Python?

Basically, I do lots of one-off code generation, large-scale refactorings, etc. etc. in Java.
My tool language of choice is Python, but I'll take whatever solutions you can offer.
Here is a simplified illustration of what I would like, in a pseudocode
Generating an implementation for an interface
search within my project:
for each Interface as iName:
write class(name=iName+"Impl", implements=iName)
search within the body of iName:
for each Method as mName:
write method(name=mName, body="// TODO implement this...")
Basically, the tool I'm searching for would allow me to:
parse files according to their Java structure ("search for interfaces")
search for words contextualized by language elements and types ("variables of type SomeClass", "doStuff() method calls on SomeClass instances")
to run searches with structural context ("within the body of the current result")
easily replace or generate code (with helpers to generate, as above, or functions for replacing, "rename the interface to Foo", "insert the line Blah.Blah()", etc.)
The point is, I don't want to spend a lot of time writing these things, as they are usually throwaway. But sometimes I need something just a little smarter than what grep offers. It wouldn't be too hard to write up a simplistic version of this, but if I'm going to use something like this at all, I'd expect it to be robust.
Any suggestions of a tool/library that will help me accomplish this?
Edit to add some clarification
Python is definitely not necessary; I'll take whatever is that. I merely suggest it incase there are choices.
This is to be used in combination with IDE refactoring; sometimes it just doesn't do everything I want.
In instances where I'm using for code generation (as above), it's for augmenting the output of other code generators. e.g. a library we use outputs a tonne of interfaces, and we need to make standard implementations of each one to mesh it to our codebase.
First, I am not aware of any tool or libraries implemented in Python that specifically designed for refactoring Java code, and a Google search did not give me any leads.
Second, I would posit that writing such a decent tool or library for refactoring Java in Python would be a large task. You would have to implement a Java compiler front-end (lexer/parser, AST builder and type analyser) in Python, then figure out how to integrate this with a program editor. I'm not surprised that nobody has done this ... given that mature alternatives already exist.
Thirdly, doing refactoring without a full analysis of the source code (but uses pattern matching for example) will be incapable of doing complex refactoring, and will is likely to make mistakes in edge cases that the implementor did not think of. I expect that is the level at which the OP is currently operating ...
Given that bleak outlook, what are the alternatives:
One alternative is to use one of the existing Java IDEs (e.g. NetBeans, Eclipse, IDEA. etc) as a refactoring tool. The OP won't be able to extend the capabilities of such a tool in Python code, but the chances are that he won't really need to. I expect that at least one of these IDEs does 95% of what he needs, and (if he is realistic) that should be good enough. Especially when you consider that IDEs have lots of incidental features that help make refactoring easier; e.g. structured editing, undo/redo, incremental compilation, intelligent code completion, intelligent searching, type and call hierarchy views, and so on.
(Aside ... if existing IDEs are not good enough (#WizardOfOdds - only the OP can make that call!!), it would make more sense to try to extend the refactoring capability of an existing IDE than start again in a different implementation language.)
Depending on what he is actually doing, model-driven code generation may be another alternative. For instance, if the refactoring is happening because he is frequently creating and recreating his object model(s), then an alternative is to code the models in some modeling language and generate his code from those models. My tool of choice when doing this kind of thing is Eclipse EMF and related technologies. The EMF technologies include generation of editors, XML serialization, persistence, queries, model to model transformation and so on. I have used EMF to implement and roll out projects with object models consisting of 50 to 100 distinct classes with complex relationships and validation requirements. EMF's support for merging source code edits when you regenerate from an updated model is a key feature.
If you are coding in Java, I strongly recommend that you use NetBeans IDE. It has this kind of refactoring support builtin. Eclipse also supports this kind of thing (although I prefer NetBeans). Both projects are open source, so if you want to see how they perform this refactoring, you can look at their source code.
Java has its fair share of criticism these days but in the area of tooling - it isn't justified.
We are spoiled for choice; Eclipse, Netbeans, Intellij are the big three IDEs. All of them offer excellent levels of searching and Refactoring. Eclipse has the edge on Netbeans I think and Intellij is often ahead of Eclipse
You can also use static analysis tools such as FindBugs, CheckTyle etc to find issues - i.e. excessively long methods and classes, overly complex code.
If you really want to leverage your Python skills - take a look at Jython. Its a Python interpreter written in Java.

parsing and translating from text to xml

I need to translate programs written in a domain specific language into xml representation. These programs are in the form of simple text file. What approach would you suggest me? What api should I use to:
Parse the text files written in this language.
Write xml based on the token and token streams I obtain.
My criteria is more of a rapid and easier development rather then memory or computing time efficiency.
Many Thanks
Ketan
The less trivial part of the job is with step #1, parsing the Domain Specific Language (DSL) text, rather than #2, pushing this to some XML language.
Hopefully you readily have a parser for the DSL (obviously this language must have been put to use somewhere...), and you may be able to "hook" your export/conversion logic into this parser. If such is not possible, you'll need to write a new parser.
Depending on the complexity of the DSL, you may be able to write, longhand, a simple parser based on a few loops and switch cases.
For more complicated languages, ANTLR is often a good choice. In a nutshell, one formalize the grammar of the DSL, in Backus Naur Form (BNF, or actually EBNF, here, i.e. the Extended family) and ANTLR produces a parser, written in a target language of choice (including Java). The learning curve with ANTLR is a factor to consider but in the context of a moderately to extremely sophisticated language, a well worth investment. ANTLR is similar but, in my opinion, a better tool than GNU Bison, this latter would however do the trick as well, and too, target Java is so desired.
If you are familiar with other languages, in particular Python, there are many other tools that can be put to use for more or less ad-hoc parsers; I've also used PyParsing and gladly recommend it.
XStream is the best XML serializer/deserializer for Java EVAR. If you can turn your DSL into Java classes, this is a great library to use.

Best design for generating code from an AST?

I'm working on a pretty complex DSL that I want to compile down into a few high level languages. The whole process has been a learning experience. The compiler is written in java.
I was wondering if anyone knew a best practice for the design of the code generator portion. I currently have everything parsed into an abstract syntax tree.
I was thinking of using a template system, but I haven't researched that direction too far yet as I would like to hear some wisdom first from stack overflow.
Thanks!
When I was doing this back in my programming languages class, we ended up using emitters based on following the visitor pattern. It worked pretty well - makes retargeting it to new output languages pretty easy, as long as your AST matches what you're printing fairly well.
What you really want is a program transformation system, that maps syntax structures in one language (your DSL) into syntax patterns in other langauges. Such a tool can carry out arbitrary transformations (tree-rewrites generalize string-rewrites which are Post systems which are full Turing capable) during the code generation project, which means that what you generate and how sophisticated your generation process is determined only by your ambition, not by "code generator framework" properties.
Sophtisticated program transformation systems combine various types of scoping, flow analysis and/or custom analyzers to enable the tranformations. This doesn't add any theoretical power, but it adds a lot of practical power: most real languages (even DSLs) have namespaces, control and data flow, need type inference, etc. etc.
Our DMS Software Reengineering Toolkit is this type of transformation system. It has been used to analyze/transform both conventional languages and DSLs, for simple and complex languages, and for small, large and even huge software systems.
Related to comments by OP about "turning the AST into other languages", that is accomplished by DMS by writing transformations that map surface syntax for the DSL (implemented behind the scenes his DSL's AST) to surface syntax for the target language (implemented using target language ASTs). The resulting target langauge AST is then prettyprinted automatically by DMS to provide actual source code in the target language, that corresponds to the target AST.
If you are already using ANTLR and have your AST ready you might want to take a look at StringTemplate:
http://www.antlr.org/wiki/display/ST/StringTemplate+Documentation
Also Section 9.6 of The Definitive ANTLR Reference: Building Domain-Specific Languages explains this:
http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference
The free code samples are available at http://media.pragprog.com/titles/tpantlr/code/tpantlr-code.tgz. In the subfolder code\templates\generator\2pass\ you'll find an example converting mathematical expressions to java bytecode.

Categories

Resources