I want to parse some data, and I have a BNF grammar to parse it with. Can anyone recommend any grammar compilers capable of generating code that can be used on a mobile device?
Since this is for JavaME, the generated code must be:
Hopefully pretty small
Low dependencies on exotic Java libraries
Not dependant on any runtime jar files.
I have used JFlex before, and I know it satisfies your second and third requirements. But I don't know how big the generated code might be. According to the manual, it generates a packed DFA table by default, so it might not be too bad.
The first question is do you have an existing grammar definition? When I've ported a LALR grammar to Java, I've used JFlex/CUP.
If your starting from scratch, I'd suggest you use JavaCC/FreeCC, which is an LL(k) parser. It's quite well documented and there are not runtime dependencies.
Related
I was wondering, whether it is possible to take the antlr grammar (*.g) or the generated parsers (from this grammar) and use it in a separate project?
For this I was looking into the SysMLv2 (eclipse-based) project on github, where xtext was used in order to define the grammar of this new modelling language. The grammar and the generated parsers can be found here.
My first idea was just to take the grammar file (InternalAlf.g) and use antlr (i tried 3.5.0 and 3.5.2) in order to generate the parser + lexer. Doing this i end up with a bunch of error message that symbols were not found (the symbol in question: EObject).
Then since it is obviously an eclipse project i figured another naive solution would be to package the whole project as a jar and include it as library in mine. I tried to use eclipse for that (export -> excecutable jar). That option requires a MainClass, where i am not sure which one to take and which also lets me doubt this approach. Using the other export jar option, does not allow to add the necessary dependencies to my jar.
Anyone other proposals? Since the antlr grammar file is available, it should be (actually) quite easy to generate the parser, but i am not sure how to do this, since this grammar file has a bunch of dependecies. Or if I rephrase this question: how do i deal with this type of antlr grammar files (that have dependecies to java libraries). In typical antlr tutorials, I (as a newb in antlr and xtext) could not find the answer.
best regards
I looked at the grammar in that project. IT is HIGHLY specific to Xtext. (To the point that it’s a bit difficult to find the ANTLR grammar amongst all of the actions).
You might be able to use the ANTLR3 grammar to parse it and discard all of the actions, etc. that make it so tightly coupled to Xtext (being careful about any semantic predicates and dependencies they might have on those actions). Emphasis on the MIGHT here.
In short, it’s not going to be at all simple to generate a parser divorced from Xtext using this grammar.
If you were to elaborate on what you need to accomplish by not just using the Xtext SysMLv2, and feel a need to create a separate parser someone might be able to point you in an appropriate direction.
I am learning compiler construction and want to implement the JavaScript grammar using JavaCC.
(I have already written my own JavaScript CodeModel which allows programmatic construction of the JavaScript code, now I want to write a JavaCC-based parser counterpart for that.)
My question is, is there a way to modularize the JavaCC grammar (.jj-file) into several files?
I have very good experience with the JavaParser so I am learning from their java_1_5.jj grammar. However, this is a 3000+ LoC file which is a bit hard to comprehend.
I would like to divide the grammar file into several parts so that it's easier to hande and understand. My Google searchen on "javacc modular", "javacc include", "javacc import" brought me some cryptic results which did not help much.
To be specific, how would I move the definition of the IDENTIFIER (lines 380-1081) to another file?
There is no way built in to JavaCC to modularize .jj files. The best thing to do is often to use JJT, as this allows you to move all actions out of the grammar file. If you don't want to use JJT, the next best thing may be to use the builder pattern.
If you just want an include facility, there are many preprocessors that can be used.
Yes, you can create various classes and pass parameters by creating objects and send to the objects in try catch inside javacc file by which it will look modular.
I'm aware that the question was posed nearly six years ago, but I'll answer even so, since people will be looking for an answer to this now and again.
The most advanced version of JavaCC is JavaCC 21and JavaCC 21 does have (among other things) an INCLUDE directive that, as best I can guess, is exactly what you are looking for.
There are actually quite a few other features that JavaCC21 that are not present in the legacy JavaCC project. Here is a biggie: the longstanding bug in which nested syntactic lookahead does not work correctly has been fixed. See here.
I need to implement an idl-to-java compiler. In fact, it's not idl-to-java. Interface definition language is extended. So I need to implement a compiler which can generates java source file. I know nothing about corba and I feel hard to start. Do you think it's possible for me to finish this work in half a year? and if so, what should I do. ps: please forgive my English.
If you don't know anything about parsers and parser generators it's going to be a tough job, but I think that half a year should be plenty if you don't start from scratch.
I suggest that you use Antlr, which happens to have an IDL parser implementation among its contributed examples. This is probably for an older version of Antlr, but it's definitely a good starting point. Be sure to get hold of the Antlr book, you're going to need it!
For the code generation part you could use StringTemplate, a template engine written by Antlr's author, Terence Parr, exactly for this purpose.
If you really have to implement a whole ORB you might as well check out how others did it, e.g. here.
A true IDL-to-java not only spews Java code that maps that stuff back to IDL definitions (strictly adhering to the OMG standards). It also generate Java code that allows your definitions to work with an underlying CORBA stack (not unlike a true compiler generating instructions for a target hardware architecture.)
That is, an IDL compiler
1) takes your IDL definitions and converts them into CORBA-stack, language-specific independent definitions (in your case, in Java).
2) In addition to that, it generates CORBA-stack/vendor specific code as well.
If all you need is something that does #1, then it's not an IDL-to-Java compiler (not in the true sense of the word). But we can call it that for the sake of simplicity.
So you have two possible routes here:
1) Look at the source code of IDL compilers from existing CORBA stacks that are Java based (OpenOrb or JacOrb), or
2) Look at the OMG's specs that tell you how to map from IDL to your language of choice: http://www.omg.org/technology/documents/idl2x_spec_catalog.htm
This is all assuming you know about compiler theory and implementation. Otherwise, if this is an experiment for learning, great! But if this is part of work with a deadline, this could be an unrealistic task.
Either way, good luck.
You can use idl4emf:
http://code.google.com/p/idl4emf/
This project is composed by an IDL grammar implementation in Xtext and an IDL metamodel implementation in Ecore.
This project also includes a code generator project from IDL files. You can implement your own generator from IDL files just writing Xpand templates in Eclipse EMF.
I've used this project as part of several generator projects successfully.
I'm working on a compiler design project in Java. Lexical analysis is done (using jflex) and I'm wondering which yacc-like tool would be best(most efficient, easiest to use, etc.) for doing syntactical analysis and why.
If you specifically want YACC-like behavior (table-driven), the only one I know is CUP.
In the Java world, it seems that more people lean toward recursive descent parsers like ANTLR or JavaCC.
And efficiency is seldom a reason to pick a parser generator.
In the past, I've used ANLTR for both lexer and parser, and the JFlex homepage says it can interoperate with ANTLR. I wouldn't say that ANTLR's online documentation is that great. I ended up investing in 'The Definitive ANTLR reference', which helped considerably.
GNU Bison has a Java interface,
http://www.gnu.org/software/bison/manual/html_node/Java-Bison-Interface.html
You can use it go generate Java code.
There is also jacc.
Jacc is about as close to yacc as you can get, but it is implemented in pure java and generates a java parser.
It interfaces well with jFlex
http://web.cecs.pdx.edu/~mpj/jacc/
Another option would be the GOLD Parser.
Unlike many of the alternatives, the GOLD parser generates the parsing tables from the grammar and places them in a binary, non-executable file. Each supported language then has an engine which reads the binary tables and parses your source file.
I've not used the Java implementation specifically, but have used the Delphi engine with fairly good results.
I am working on a small text editor project and want to add basic syntax highlighting for a couple of languages (Java, XML..just to name a few). As a learning experience I wanted to add one of the popular or non popular Java lexer parser.
What project do you recommend. Antlr is probably the most well known, but it seems pretty complex and heavy.
Here are the option that I know of.
Antlr
Ragel (yes, it can generate Java source for processing input)
Do it yourself (I guess I could write a simple token parser and highlight the source code).
ANTLR or JavaCC would be the two I know. I'd recommend ANTLR first.
ANTLR may seem complex and heavy but you don't need to use all of the functionality that it includes; it's nicely layered. I'm a big fan of using it to develop parsers. For starters, you can use the excellent ANTLRWorks to visualize and test the grammars that you are creating. It's really nice to be able to watch it capture tokens, build parse trees and step through the process.
For your text editor project, I would check out filter grammars, which might suit your needs nicely. For filter grammars you don't need to specify the entire lexical structure of your language, only the parts that you care about (i.e. need to highlight, color or index) and you can always add in more until you can handle a whole language.
Google code has new project acacia-lex. Written by myself, it seems simple (so far) java lexer using javax annotations.
SableCC
Another interesting option (which I didn't try yet) would be Xtext, which uses Antlr but also includes tools for creating Eclipse editors for your language.
ANTLR is the way to go. I would not build it by hand. You'll also find if you look around on the ANTLR web site that grammars are available for Java, XML, etc.
Another option would be Xtext. It will not only generate a parser for your grammar, but also a complete editor with syntax coloring, error markers, content assist and outline view.
I've done it with JFlex before and was quite satisfied with it. But the language I was highlighting was simple enough that I didn't need a parser generator, so your mileage may vary.
JLex and CUP are decent lexer and parser generators, respectively. I'm currently using both to develop a simple scripting language for a project I'm working on.
I don't think that you need a lexer. all you need is first read the file extention to detect the language and then from a xml file which listed the language keywords easily find them and highlight them.