I am working on a project which has got a new language implementation. I have been assigned the task to construct a translator for the language. The translator should be built in Java and the translation output should produce the subset of the new language into C. I have few questions regarding that:
1. How to proceed with this?
2. Which phase should I emphasize more on? Should it be code generator phase of the compiler?
3. Do I need to have one more grammar for target language?
thanks in advance.
I'd investigate ANTLR, if you're not already at least aware of it. From http://www.antlr.org/about.html (emphasis mine):
ANTLR, ANother Tool for Language Recognition, is a language tool that
provides a framework for constructing recognizers, compilers, and
translators from grammatical descriptions containing actions in a
variety of target languages. ANTLR automates the construction of
language recognizers. From a formal grammar, ANTLR generates a program
that determines whether sentences conform to that language. In other
words, it's a program that writes other programs. By adding code
snippets to the grammar, the recognizer becomes a translator or
interpreter. ANTLR provides excellent support for intermediate-form
tree construction, tree walking, translation and provides
sophisticated automatic error recovery and reporting.
As an added bonus, ANTLR is written in and easily callable by Java.
Additional details are available at http://en.wikipedia.org/wiki/ANTLR.
Related
I'm really interested in parser combinators, especially those who can deal with left-recursive and ambiguous grammars. I know the fabulous Superpower by Nicholas Blumhardt but it's unable to deal with this kind of grammars.
I've found some GLL parser combinators libraries like this https://github.com/djspiewak/gll-combinators, but it uses Scala and, that is a big inconvenience for me (I don't know that language).
I would like to know if there is any of these in C# (or Java)
Thank you very much.
I did a compiler project, using Java on IntelliJ IDE with ANTLR 4 extension, there are good resources out on the internet. This is the official book "The Definitive ANTLR 4 Reference" I find it quite good, also they offer nice documentation.
ANTLR 4 has the ability to deal with left-recursive and ambiguous grammars, you can implement the compiler with c# and Java and any language I think.
You can use their starter grammars for too many different languages.
Edit:
ANTLR 4 is a tool for Language Recognition, a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees.
It's NOT a library.
I am building an AST tree by hand to use with my application. I currently have a lot of data in my program's memory using a standard OO approach, that will use to form an AST.
I was wondering if by chance there are already any frameworks / code generators that could help me with this task.
I am not looking for a compiler compiler. I don't want to define a grammar and have the code generator generate me a parser for it. I intend to instantiate the nodes of the tree by myself, I am only looking for a faster and cheaper way to build the .java files themselves (a plus would be having options for the node's attributes, optional beginVisit() / endVisit() methods, etc).
I would highly recommend that you take a look at Eclipse's Java Development Tools. It includes a very robust AST framework.
My understanding is that with using this API, you would have access to all attributes of the various types of AST Nodes and you can also create visitors with beginVisit() and endVisit() methods.
This seems to be the answer to the question:
http://www.jetbrains.com/mps/
The major goal of MPS is to allow extending languages. This is because
every existing language already has a strict language syntax defined,
which limits its flexibility.
The problem in extending language syntax is mainly the textual
presentation of code. This is especially true if we want to use
different language extensions, where each one may have its own syntax.
This naturally leads to the idea of non-textual presentation of
program code. A major benefit of this approach is that it eliminates
the need for code parsing. Our solution is to have code always
maintained in an Abstract Syntax Tree (AST), which consists of nodes
with properties, children and references, and fully describes the
program code.
At the same time, MPS offers an efficient way to keep writing code in
a text-like manner.
In creating a language, you define the rules for code editing and
rendering. You can also specify the language type-system and
constraints. This allows MPS to verify program code on the fly, and
thus makes programming with the new language easy and less
error-prone.
MPS uses a generative approach. You can also define generators for
their language to transform code in the custom language to compilable
code in some conventional language. Currently, MPS is particularly
good for, but is not limited to, generating Java code. You can also
generate XML, HTML, JavaScript, and more.
Anybody knows a Java implementation of Constraint Grammar for natural language processing? I know the VISL CG3 implementation, that is in C++, and I could interface it from Java, but it would be easier if I could find a Java implementation since it will be integrated to a legacy Java code.
This will be used in a Portuguese open source grammar checker and should be compatible with LGPL license.
Have a look on JAPE: Regular Expressions over Annotations. A formalism based on CPSL (COMMON PATTERN SPECIFICATION LANGUAGE) in old TIPSTER project.
It's not true context-dependent (as Context Grammar should be) but it's possible to do context dependent things with it. This is free and open source. And has a lot of Java examples.
XTDL from SPROUT project also worth looking. Not sure is it free or not.
I'm not sure if you are looking for regex over semantic graphs and tree structures. If it's the case, you can check Tregex and Semgrex that matches over Stanford dependency graphs and constituent trees.
I haven't tried Graph-Expression, but the site states that it provides a language for "structure of match -it is possible to build syntax tree based on match". I think this is comparable to JAPE (as it states in the site: "fast - it works faster then Jape transducer (gate.ac.uk) closest project to this one"). And I assume it can handle graphs, something JAPE may not be good at.
I need to implement an idl-to-java compiler. In fact, it's not idl-to-java. Interface definition language is extended. So I need to implement a compiler which can generates java source file. I know nothing about corba and I feel hard to start. Do you think it's possible for me to finish this work in half a year? and if so, what should I do. ps: please forgive my English.
If you don't know anything about parsers and parser generators it's going to be a tough job, but I think that half a year should be plenty if you don't start from scratch.
I suggest that you use Antlr, which happens to have an IDL parser implementation among its contributed examples. This is probably for an older version of Antlr, but it's definitely a good starting point. Be sure to get hold of the Antlr book, you're going to need it!
For the code generation part you could use StringTemplate, a template engine written by Antlr's author, Terence Parr, exactly for this purpose.
If you really have to implement a whole ORB you might as well check out how others did it, e.g. here.
A true IDL-to-java not only spews Java code that maps that stuff back to IDL definitions (strictly adhering to the OMG standards). It also generate Java code that allows your definitions to work with an underlying CORBA stack (not unlike a true compiler generating instructions for a target hardware architecture.)
That is, an IDL compiler
1) takes your IDL definitions and converts them into CORBA-stack, language-specific independent definitions (in your case, in Java).
2) In addition to that, it generates CORBA-stack/vendor specific code as well.
If all you need is something that does #1, then it's not an IDL-to-Java compiler (not in the true sense of the word). But we can call it that for the sake of simplicity.
So you have two possible routes here:
1) Look at the source code of IDL compilers from existing CORBA stacks that are Java based (OpenOrb or JacOrb), or
2) Look at the OMG's specs that tell you how to map from IDL to your language of choice: http://www.omg.org/technology/documents/idl2x_spec_catalog.htm
This is all assuming you know about compiler theory and implementation. Otherwise, if this is an experiment for learning, great! But if this is part of work with a deadline, this could be an unrealistic task.
Either way, good luck.
You can use idl4emf:
http://code.google.com/p/idl4emf/
This project is composed by an IDL grammar implementation in Xtext and an IDL metamodel implementation in Ecore.
This project also includes a code generator project from IDL files. You can implement your own generator from IDL files just writing Xpand templates in Eclipse EMF.
I've used this project as part of several generator projects successfully.
I'm working on a pretty complex DSL that I want to compile down into a few high level languages. The whole process has been a learning experience. The compiler is written in java.
I was wondering if anyone knew a best practice for the design of the code generator portion. I currently have everything parsed into an abstract syntax tree.
I was thinking of using a template system, but I haven't researched that direction too far yet as I would like to hear some wisdom first from stack overflow.
Thanks!
When I was doing this back in my programming languages class, we ended up using emitters based on following the visitor pattern. It worked pretty well - makes retargeting it to new output languages pretty easy, as long as your AST matches what you're printing fairly well.
What you really want is a program transformation system, that maps syntax structures in one language (your DSL) into syntax patterns in other langauges. Such a tool can carry out arbitrary transformations (tree-rewrites generalize string-rewrites which are Post systems which are full Turing capable) during the code generation project, which means that what you generate and how sophisticated your generation process is determined only by your ambition, not by "code generator framework" properties.
Sophtisticated program transformation systems combine various types of scoping, flow analysis and/or custom analyzers to enable the tranformations. This doesn't add any theoretical power, but it adds a lot of practical power: most real languages (even DSLs) have namespaces, control and data flow, need type inference, etc. etc.
Our DMS Software Reengineering Toolkit is this type of transformation system. It has been used to analyze/transform both conventional languages and DSLs, for simple and complex languages, and for small, large and even huge software systems.
Related to comments by OP about "turning the AST into other languages", that is accomplished by DMS by writing transformations that map surface syntax for the DSL (implemented behind the scenes his DSL's AST) to surface syntax for the target language (implemented using target language ASTs). The resulting target langauge AST is then prettyprinted automatically by DMS to provide actual source code in the target language, that corresponds to the target AST.
If you are already using ANTLR and have your AST ready you might want to take a look at StringTemplate:
http://www.antlr.org/wiki/display/ST/StringTemplate+Documentation
Also Section 9.6 of The Definitive ANTLR Reference: Building Domain-Specific Languages explains this:
http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference
The free code samples are available at http://media.pragprog.com/titles/tpantlr/code/tpantlr-code.tgz. In the subfolder code\templates\generator\2pass\ you'll find an example converting mathematical expressions to java bytecode.