Is there any GLL parser combinator library in C# / Java? - java

I'm really interested in parser combinators, especially those who can deal with left-recursive and ambiguous grammars. I know the fabulous Superpower by Nicholas Blumhardt but it's unable to deal with this kind of grammars.
I've found some GLL parser combinators libraries like this https://github.com/djspiewak/gll-combinators, but it uses Scala and, that is a big inconvenience for me (I don't know that language).
I would like to know if there is any of these in C# (or Java)
Thank you very much.

I did a compiler project, using Java on IntelliJ IDE with ANTLR 4 extension, there are good resources out on the internet. This is the official book "The Definitive ANTLR 4 Reference" I find it quite good, also they offer nice documentation.
ANTLR 4 has the ability to deal with left-recursive and ambiguous grammars, you can implement the compiler with c# and Java and any language I think.
You can use their starter grammars for too many different languages.
Edit:
ANTLR 4 is a tool for Language Recognition, a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees.
It's NOT a library.

Related

Yacc equivalent for Java

I'm working on a compiler design project in Java. Lexical analysis is done (using jflex) and I'm wondering which yacc-like tool would be best(most efficient, easiest to use, etc.) for doing syntactical analysis and why.
If you specifically want YACC-like behavior (table-driven), the only one I know is CUP.
In the Java world, it seems that more people lean toward recursive descent parsers like ANTLR or JavaCC.
And efficiency is seldom a reason to pick a parser generator.
In the past, I've used ANLTR for both lexer and parser, and the JFlex homepage says it can interoperate with ANTLR. I wouldn't say that ANTLR's online documentation is that great. I ended up investing in 'The Definitive ANTLR reference', which helped considerably.
GNU Bison has a Java interface,
http://www.gnu.org/software/bison/manual/html_node/Java-Bison-Interface.html
You can use it go generate Java code.
There is also jacc.
Jacc is about as close to yacc as you can get, but it is implemented in pure java and generates a java parser.
It interfaces well with jFlex
http://web.cecs.pdx.edu/~mpj/jacc/
Another option would be the GOLD Parser.
Unlike many of the alternatives, the GOLD parser generates the parsing tables from the grammar and places them in a binary, non-executable file. Each supported language then has an engine which reads the binary tables and parses your source file.
I've not used the Java implementation specifically, but have used the Delphi engine with fairly good results.

Best design for generating code from an AST?

I'm working on a pretty complex DSL that I want to compile down into a few high level languages. The whole process has been a learning experience. The compiler is written in java.
I was wondering if anyone knew a best practice for the design of the code generator portion. I currently have everything parsed into an abstract syntax tree.
I was thinking of using a template system, but I haven't researched that direction too far yet as I would like to hear some wisdom first from stack overflow.
Thanks!
When I was doing this back in my programming languages class, we ended up using emitters based on following the visitor pattern. It worked pretty well - makes retargeting it to new output languages pretty easy, as long as your AST matches what you're printing fairly well.
What you really want is a program transformation system, that maps syntax structures in one language (your DSL) into syntax patterns in other langauges. Such a tool can carry out arbitrary transformations (tree-rewrites generalize string-rewrites which are Post systems which are full Turing capable) during the code generation project, which means that what you generate and how sophisticated your generation process is determined only by your ambition, not by "code generator framework" properties.
Sophtisticated program transformation systems combine various types of scoping, flow analysis and/or custom analyzers to enable the tranformations. This doesn't add any theoretical power, but it adds a lot of practical power: most real languages (even DSLs) have namespaces, control and data flow, need type inference, etc. etc.
Our DMS Software Reengineering Toolkit is this type of transformation system. It has been used to analyze/transform both conventional languages and DSLs, for simple and complex languages, and for small, large and even huge software systems.
Related to comments by OP about "turning the AST into other languages", that is accomplished by DMS by writing transformations that map surface syntax for the DSL (implemented behind the scenes his DSL's AST) to surface syntax for the target language (implemented using target language ASTs). The resulting target langauge AST is then prettyprinted automatically by DMS to provide actual source code in the target language, that corresponds to the target AST.
If you are already using ANTLR and have your AST ready you might want to take a look at StringTemplate:
http://www.antlr.org/wiki/display/ST/StringTemplate+Documentation
Also Section 9.6 of The Definitive ANTLR Reference: Building Domain-Specific Languages explains this:
http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference
The free code samples are available at http://media.pragprog.com/titles/tpantlr/code/tpantlr-code.tgz. In the subfolder code\templates\generator\2pass\ you'll find an example converting mathematical expressions to java bytecode.

What's the best tool for generating a parser in Java for my own language grammar?

I'm developing a small programming language based mostly of the C99 standard and I've already written a fairly decent lexer in java and now I'm looking to generate a Java Parser from the grammar. I know there's Bison, but that seems to only generate C code. I'm looking for a application that will allow me to input my grammar and create a full parser class in java code. Reading other SO posts on related topics, I've found ANTLR, but I'm wondering if anyone in the SO knows about a better tool?
thanks!
Another couple to look at are JavaCC and SableCC (it has been a long time since I looked at SableCC).
I've been quite impressed by BNFC, which is able to generate parsers in Java as well as in C, C++, C#, F#, Haskell, and OCaml.
The JFlex home page at http://jflex.de indicates where to find Bison-like tools that can target Java:
http://byaccj.sourceforge.net/
http://www2.cs.tum.edu/projects/cup/
http://www.antlr.org/

C++ Parser/Model for Java

I was wondering if anyone knows of existing C++ parsers/code models that can be used programmatically in Java. I'm looking for something similar to the Eclipse CDT that can be used as a library from Java (and that does not rely upon Eclipse). Thanks in advance.
You don't want to build your own C++ parser. It'll kill you.
You already know about the Eclipse CDT project:
www.ibm.com/developerworks/library/os-ecl-cdt3/index.html
AFAIK, that parser is, well, a bit fuzzy around the edges. YMMV.
Advantage: in Java (and in Eclipse if you care).
If you want to process C++, and do it in Java, this might
be your only practical choice.
There is also our DMS Software Reengineering Toolkit C++ front end:
http://www.semdesigns.com/Products/FrontEnds/CppFrontEnd.html
Works with a wide variety of C++ dialects (ANSI, GNU, MSVC 2005/2008),
tested by fire on millions of lines of code.
Disadvantage from your point of view: Not in Java.
But if you really want to analyze C++, making a rule
that you are only willing to do it in Java might not
serve you the best.
There are some incomplete LALR grammars for parser generators like Lex, Yacc, Antlr, Jack, etc.
C++ has an undecidable syntax grammar, so LALR and BNR grammars will always be incomplete, but as long as you're not trying to write a C++ compiler, they should be good enough.
There are some C++ grammars out there for JavaCC. Try google.

Which Java oriented lexer parser for simple project (ANTLR, DIY, etc)

I am working on a small text editor project and want to add basic syntax highlighting for a couple of languages (Java, XML..just to name a few). As a learning experience I wanted to add one of the popular or non popular Java lexer parser.
What project do you recommend. Antlr is probably the most well known, but it seems pretty complex and heavy.
Here are the option that I know of.
Antlr
Ragel (yes, it can generate Java source for processing input)
Do it yourself (I guess I could write a simple token parser and highlight the source code).
ANTLR or JavaCC would be the two I know. I'd recommend ANTLR first.
ANTLR may seem complex and heavy but you don't need to use all of the functionality that it includes; it's nicely layered. I'm a big fan of using it to develop parsers. For starters, you can use the excellent ANTLRWorks to visualize and test the grammars that you are creating. It's really nice to be able to watch it capture tokens, build parse trees and step through the process.
For your text editor project, I would check out filter grammars, which might suit your needs nicely. For filter grammars you don't need to specify the entire lexical structure of your language, only the parts that you care about (i.e. need to highlight, color or index) and you can always add in more until you can handle a whole language.
Google code has new project acacia-lex. Written by myself, it seems simple (so far) java lexer using javax annotations.
SableCC
Another interesting option (which I didn't try yet) would be Xtext, which uses Antlr but also includes tools for creating Eclipse editors for your language.
ANTLR is the way to go. I would not build it by hand. You'll also find if you look around on the ANTLR web site that grammars are available for Java, XML, etc.
Another option would be Xtext. It will not only generate a parser for your grammar, but also a complete editor with syntax coloring, error markers, content assist and outline view.
I've done it with JFlex before and was quite satisfied with it. But the language I was highlighting was simple enough that I didn't need a parser generator, so your mileage may vary.
JLex and CUP are decent lexer and parser generators, respectively. I'm currently using both to develop a simple scripting language for a project I'm working on.
I don't think that you need a lexer. all you need is first read the file extention to detect the language and then from a xml file which listed the language keywords easily find them and highlight them.

Categories

Resources