Is there a Java implementation of Constraint Grammar? - java

Anybody knows a Java implementation of Constraint Grammar for natural language processing? I know the VISL CG3 implementation, that is in C++, and I could interface it from Java, but it would be easier if I could find a Java implementation since it will be integrated to a legacy Java code.
This will be used in a Portuguese open source grammar checker and should be compatible with LGPL license.

Have a look on JAPE: Regular Expressions over Annotations. A formalism based on CPSL (COMMON PATTERN SPECIFICATION LANGUAGE) in old TIPSTER project.
It's not true context-dependent (as Context Grammar should be) but it's possible to do context dependent things with it. This is free and open source. And has a lot of Java examples.
XTDL from SPROUT project also worth looking. Not sure is it free or not.

I'm not sure if you are looking for regex over semantic graphs and tree structures. If it's the case, you can check Tregex and Semgrex that matches over Stanford dependency graphs and constituent trees.

I haven't tried Graph-Expression, but the site states that it provides a language for "structure of match -it is possible to build syntax tree based on match". I think this is comparable to JAPE (as it states in the site: "fast - it works faster then Jape transducer (gate.ac.uk) closest project to this one"). And I assume it can handle graphs, something JAPE may not be good at.

Related

What is the difference between Acceleo and Xpand?

I have a DSL which is based on a custom metamodel, which in its turn is based on EMF/Ecore. I am trying to figure out which solution to choose, and I cant find any decent comparisons anywhere.
Does anyone have any reasons why I should choose one over the other?
What I know so far is that Acceleo uses a OMG standardized language, but it seems harder to use than Xpand.
First of all, I wonder why you consider Acceleo more difficult to learn than Xpand, while both languages have differences (blocks and delimiters for example) they have quite a similar structure. I won't details all the elements in both languages but, for example, I don't see such a difference between something like:
«FOREACH myAttributes AS a»«a.name»«ENDFOREACH»
and
[for (a: Attribute|myAttributes)][a.name/][/for]
Both are template based languages and as such they have quite the same structure. The main difference between Acceleo and Xpand comes from the fact that Acceleo is based on the standards MOFM2T and OCL from the OMG and the tooling.
I am not very familiar with Xpand tooling but you can find more about it on their wiki. Acceleo on the other side contains an editor with syntax highlighting, code completion, error detection, refactoring and more. It also contains a debugger, a profiler, Ant and Maven support. You can also easily deploy your generators as Eclipse plugin for other users or use them out of Eclipse in a regular Java application. You can find more information on Acceleo here. You can see in videos most of the features of Acceleo on the Obeo Network (registration required).
Finally, the latest activity on xPand as occurred a year ago while Acceleo is actively developed. You can even follow the Acceleo development on github if you want.
Stephane Begaudeau
Disclaimer: I am one of the member of the Acceleo dev' team.
I am a dabbler, not an expert.
My impression is that if you need little more than a templating language, then Xpand is the way to go. Otherwise, pick Acceleo - but as you say, the learning curve is very steep.
When do you need more than a templating language? For me, they seem to run out of gas when the structure (not content) of the output is dependent on multiple independent pieces of the input. If you don't want to get into Acceleo, but have one of these cases, consider inventing an auto-generated "shim" language that gets you partway from input language to output language, perhaps with a lot of redundancy in it to avoid lookups at template-generation time.
I've been using the old 2.x Acceleo on a full scalled project and done some test with the new one.
The langage is pretty easy to use, but with the new version it's a little bit more difficult to bind some
java code to your template when the script langage is not enought.
I was a very big fan of the 2.x, but with the 3.x, I add lots of troubles to make it work. You have to write java code to handle eclipse resources for instance. I totaly gave up when updating to juno, my acceleo projects didn't worked anymore and I didn't manage to correct it in two days. I hope they will make it easier to use out of the box.
Basically the main difference is that ACCELEO is an implementation of the MOF Models To Text Transformation Language which is the OMG (Object Management Group) Standard for the definition of Models to Text transformation. It is therefore a standard language designed by the same group ho designed MOF, UML, SysML and MDA in general. XPAnd is a language which I guess existed before the standard but it is now different from it.
If you start from scratch then start with Acceleo.
In my case, I use a custom meta-model (derived from UML2) with custom stereotypes and stereotypes properties). I tried both Acceleo and Xpand template languages. Indeed they are pretty similar in term of structure and capabilities.
However, I can see one big difference (which makes Xpand much better in this use case): you can use your custom stereotypes in your Xpand templates.
Xpand engine brilliantly chooses the "best matching template/rule" for every stereotype (taking into account inheritance between stereotypes as well).
Furthermore, it is very easy to obtain stereotype properties.
These two "features" make the templates very elegant, compact and readable.
For example:
«DEFINE myTemplate FOR MyUmlProfile::MyStereoType»
MyValue: «this.myStereotypeProperty» or simply: «myStereotypeProperty»
«ENDDEFINE»
In Acceleo, I found it clumsy to achieve the same (longer statements, more code) and my templates ended up lengthy and complex. The positive thing about Acceleo, however, was that it worked conveniently from IBM RSA (applied directly to RSA (emx) models). It has code highlighting and auto-complete working nicely.
Xpand only worked if I exported my RSA models to ".uml" (~XML) format. It doesn't offer code highlighting or auto-complete (or at least I didn't figure out how).
Considering all pros and cons, I still vote for Xpand (in my use case).

Complete metaprogramming framework for Java?

I'm interested in metaprogramming (i.e. programs that help programmers do tedious programming tasks). I'm looking for a tool which has the following properties:
usable both at compile time and runtime;
inspects program structure;
can add new classes, methods or fields and make them visible to Java compiler;
can change behavior of methods;
Java-based (well, Java is most popular programming language according to some rankings);
good integration with IDEs and build tools like Ant, Gradle or Maven;
actively maintained project;
easy to use and extend;
There are some solutions for this, like:
reflection
AspectJ
Annotation Processing Tool
bytecode manipulation (CGLIB, Javassist, java.lang.instrument)
Eclipse JDT
Project Lombok
Groovy, JRuby, Scala
But unfortunately none of them meets all the criteria above. Is there any complete metaprogramming solution for Java?
There's JackPot, which is Java based but I don't think gets a lot a current attention. Has ASTs and symbol tables AFAIK. You can probably extend it; I doubt anybody will stop (or help) you.
There's the Java-based compiler APIs for the Sun, er, Oracle java compiler. They're likely actively maintained, but I don't think you can modify source code and regenerate it. Certainly has symbol tables; dunno about trees. Probably pretty hard to extend; you have to keep up with the compiler guys, not the other way round.
There is ANTLR, which has a Java implementation and a Java parser that will build ASTs. I don't think it has full symbol tables, so doing serious code analysis/revision is likely to be hard. ANTLR is certainly actively maintained, and nobody will object to you enhancing the Java grammar with symbol tables. Just know that will take you about 6 months for Java 1.6 if that's all you do. (That's how long it took our internal [smart] guy to do it for DMS, starting with symbol table support for 1.4).
Not in Java, and not easily integrated into IDEs, but capable of carrying massive analysis and transformation on Java code is our DMS Software Reengineering Toolkit with its Java Front End.
DMS is generic compiler machinery: parsing, AST building, symbol table machinery, flow analysis machinery, with that additional bonuses of source-to-source transformations and generic prettyprinting of ASTs back to legal text including retention of comments. It offers a set of APIs supporting these services, and additional tools for defining grammars and langauge-dependent flow analyzers.
The Java Front End gives crucial detail (using those APIs) to DMS to allow it process Java: a grammar/parser, full symbol table construction for Java 1.4-1.6 (with 1.7 due momentarily), as well as some control and data flow analysis (to be extended over time because this stuff is so useful).
By using the services provided by DMS and the Java Front end, one can reasonably contemplate building arbitrary Java anlaysis and transformation tools. (This makes the tool a "complete" metaprogramming tool, in that it can inspect any language structure, or change any language structure, as opposed to say template metaprogramming or reflection). We believe this to be much more effective than ad hoc tools because you don't have to build the infrastructure, the infrastructure provided is robust and handles cases you don't have the energy to implement, and it is designed to support such tasks. YMMV.
DMS/Java Front end have been used to construct a variety of Java tools: test coverage, profilers, dead code elimination, clone detection on scale, JavaDoc with hyperlinked source-code, fast XML parser/generators, etc.
Yes, its actively maintained; undergoing continuous enhancement since the first version in 1998.
There's a Java metaprogramming framework that is part of Tapestry IOC, it's called Plastic. It munges class bytecodes using custom classloaders, I haven't tried it yet but it looks like it gives a simple interface that still enables the programmer to make powerful metaprogramming changes.
Check out the Meta Programming System:
http://www.jetbrains.com/mps/
It has great IDE support and is used quite frequently by the smart folks at JetBrains.
Check out Spring Roo.

Yacc equivalent for Java

I'm working on a compiler design project in Java. Lexical analysis is done (using jflex) and I'm wondering which yacc-like tool would be best(most efficient, easiest to use, etc.) for doing syntactical analysis and why.
If you specifically want YACC-like behavior (table-driven), the only one I know is CUP.
In the Java world, it seems that more people lean toward recursive descent parsers like ANTLR or JavaCC.
And efficiency is seldom a reason to pick a parser generator.
In the past, I've used ANLTR for both lexer and parser, and the JFlex homepage says it can interoperate with ANTLR. I wouldn't say that ANTLR's online documentation is that great. I ended up investing in 'The Definitive ANTLR reference', which helped considerably.
GNU Bison has a Java interface,
http://www.gnu.org/software/bison/manual/html_node/Java-Bison-Interface.html
You can use it go generate Java code.
There is also jacc.
Jacc is about as close to yacc as you can get, but it is implemented in pure java and generates a java parser.
It interfaces well with jFlex
http://web.cecs.pdx.edu/~mpj/jacc/
Another option would be the GOLD Parser.
Unlike many of the alternatives, the GOLD parser generates the parsing tables from the grammar and places them in a binary, non-executable file. Each supported language then has an engine which reads the binary tables and parses your source file.
I've not used the Java implementation specifically, but have used the Delphi engine with fairly good results.

Best design for generating code from an AST?

I'm working on a pretty complex DSL that I want to compile down into a few high level languages. The whole process has been a learning experience. The compiler is written in java.
I was wondering if anyone knew a best practice for the design of the code generator portion. I currently have everything parsed into an abstract syntax tree.
I was thinking of using a template system, but I haven't researched that direction too far yet as I would like to hear some wisdom first from stack overflow.
Thanks!
When I was doing this back in my programming languages class, we ended up using emitters based on following the visitor pattern. It worked pretty well - makes retargeting it to new output languages pretty easy, as long as your AST matches what you're printing fairly well.
What you really want is a program transformation system, that maps syntax structures in one language (your DSL) into syntax patterns in other langauges. Such a tool can carry out arbitrary transformations (tree-rewrites generalize string-rewrites which are Post systems which are full Turing capable) during the code generation project, which means that what you generate and how sophisticated your generation process is determined only by your ambition, not by "code generator framework" properties.
Sophtisticated program transformation systems combine various types of scoping, flow analysis and/or custom analyzers to enable the tranformations. This doesn't add any theoretical power, but it adds a lot of practical power: most real languages (even DSLs) have namespaces, control and data flow, need type inference, etc. etc.
Our DMS Software Reengineering Toolkit is this type of transformation system. It has been used to analyze/transform both conventional languages and DSLs, for simple and complex languages, and for small, large and even huge software systems.
Related to comments by OP about "turning the AST into other languages", that is accomplished by DMS by writing transformations that map surface syntax for the DSL (implemented behind the scenes his DSL's AST) to surface syntax for the target language (implemented using target language ASTs). The resulting target langauge AST is then prettyprinted automatically by DMS to provide actual source code in the target language, that corresponds to the target AST.
If you are already using ANTLR and have your AST ready you might want to take a look at StringTemplate:
http://www.antlr.org/wiki/display/ST/StringTemplate+Documentation
Also Section 9.6 of The Definitive ANTLR Reference: Building Domain-Specific Languages explains this:
http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference
The free code samples are available at http://media.pragprog.com/titles/tpantlr/code/tpantlr-code.tgz. In the subfolder code\templates\generator\2pass\ you'll find an example converting mathematical expressions to java bytecode.

JavaME-suitable grammar compiler recommendations?

I want to parse some data, and I have a BNF grammar to parse it with. Can anyone recommend any grammar compilers capable of generating code that can be used on a mobile device?
Since this is for JavaME, the generated code must be:
Hopefully pretty small
Low dependencies on exotic Java libraries
Not dependant on any runtime jar files.
I have used JFlex before, and I know it satisfies your second and third requirements. But I don't know how big the generated code might be. According to the manual, it generates a packed DFA table by default, so it might not be too bad.
The first question is do you have an existing grammar definition? When I've ported a LALR grammar to Java, I've used JFlex/CUP.
If your starting from scratch, I'd suggest you use JavaCC/FreeCC, which is an LL(k) parser. It's quite well documented and there are not runtime dependencies.

Categories

Resources