I want to write a parser and converter of haml-like languages, to parse them, and convert them into html content.
I found people usually use regular-expression to do this, but we have to write a lot of difficult regular expressions, which is not easy. Is there any tools or libraries to do it? I hope it in java and easy to use.
And, is there any articles about how to write such a parser? Thanks in advance!
Regular expressions are usually a poor-mans-parser. A regex is not a real parser.
Parsers are usually generated by a parser generator. You specify the language in a specification file and the parser generator will convert this to sourcecode for your parser.
After some research and testing, I have to say, parboiled is the best tool for this job.
I have spent one day on the PEG and the good examples parboiled has provided, and another day on writing a simple sass parser. It was so easy and nature. Much easier and clearer than Regex. And the best thing is that I can use only Java to write the program, no external DSL needs to learn.
I want to say thank you very much to the author of parboiled, it's a great tool that I'm looking for.
You can use JavaCC. It is a yacc like parser generator. The output is the Java source code for the parser.
Related
So I started to write a parser for OCaml in Scala with the Scala CombinatorParser,
but I get the feeling that this is not the right tool for the job.
Especially getting the precedences and associativity of operators and non-closed constructions right can be challenging.
So my question is: Whats the best way to for such a real world parser like one for OCaml?
I looked into parser generators like ANTLR, but there are numerous and I have no idea which one would actually make the job easier.
You can have a look at JavaCC generator. I find it quite useful to make DSL parsers. I guess it's a good candidate for parsing "real" languages too.
OCaml parser is implemented in pretty straightforward lex+yacc. Therefore, the easiest way is to port the rules using the equivalent lex+yacc toolset in your language.
I do not mean converting OCaml parsing rules in LL(k) (i.e. Parsec) is completely impossible. Actually it is not very difficult if you write an automatic conversion tool: see my blog entry about it http://camlspotter.blogspot.sg/2011/05/planck-small-parser-combinator-library.html But, with human hands, it is an almost impossible task to do correctly in short time.
-edit-
On the second thought, the easiest way, if you are not a Scala/Java purist, is to use the original OCaml parser and write some OCaml code to output its AST to something easy to parse for any other languages, for example, S-exp.
You may want to check out ANTLR. For small DSLs I found it very usable. I assume it can handle complex languages as well.
Scala has an amazingly simple way to create parsers. Is there a fairly equivalent way to doing the same thing in the Java-only world that doesn't take a week of learning curve?
I'm not sure about the lerning curve, but in the Java world, the ANTLR Parser Generator is very well regarded and considered among the best.
How robust and how configurable does the parser need to be? If the grammar is fairly simple and stable you could just use a recursive descent parser, which uses methods that represent each grammar production rule. I think the output would be roughly what JavaCC would produce, as they are both top-down.
http://en.wikipedia.org/wiki/Recursive_descent_parser
Hope this might be helpful.
Manning publications has a book, "DSLs in Action", that covers Java in the beginning.
But, you may want to look at perhaps using Groovy to write your DSL, as there is a great deal of opportunities in a dynamic language, and it would have a shorter learning curve than Scala does.
For an introduction you can start with http://docs.codehaus.org/display/GROOVY/Writing+Domain-Specific+Languages.
The book I mentioned also covers using antlr, and when it makes sense to use and when it doesn't, so if you want to get a better understanding of how to write and maintain a DSL it is an excellent book.
I wish to create a app that translates input java code into HTML formatted java code,
For example:
public class ReadWithScanner
Would become
<span class="public">public</span> <span class="class">class</span> ReadWithScanner
However it gets quite complicated when it comes to parameters and regular expressions. Now I have a bit of time on my hands, and I wish to write my own code parser.
How would I start this? and is there any tutorials or online content to not only help me write this, but understand it.
Thanks
For help with the complexity of parsing, you'll need to rely on the Java Language Specification.
As I seem to recall, Java is an LL(k) language (see here, for instance). However, the Java language, despite all attempts to keep it "compact", is still quite large and complex. The grammar is spread out over the entire document. This is not a project for the faint at heart. You might consider using a Java parsing tool (like Java-front).
What you need to do is use ANTLR, it already has Java grammars for parsing Java, then you just need to supply your own templates to output whatever you want from the Abstract Syntax Tree you generate with ANTLR.
If you need a resource for learning about parsers, I can recommend Basics of Compiler Design, which is available as a free download.
It covers more than just parsers, but if you read the first few chapters, you should have a good basic understanding of both lexers and parsers.
I think you need a lexical analyzer.
I used early the Flex lexical analyzer. It is not too complicated to use.
If you need to parse the analyzed text you can use the bison c++
bisoncpp.sourceforge.net/
(C++ konwledge need and linux environment)
I'm working on a compiler design project in Java. Lexical analysis is done (using jflex) and I'm wondering which yacc-like tool would be best(most efficient, easiest to use, etc.) for doing syntactical analysis and why.
If you specifically want YACC-like behavior (table-driven), the only one I know is CUP.
In the Java world, it seems that more people lean toward recursive descent parsers like ANTLR or JavaCC.
And efficiency is seldom a reason to pick a parser generator.
In the past, I've used ANLTR for both lexer and parser, and the JFlex homepage says it can interoperate with ANTLR. I wouldn't say that ANTLR's online documentation is that great. I ended up investing in 'The Definitive ANTLR reference', which helped considerably.
GNU Bison has a Java interface,
http://www.gnu.org/software/bison/manual/html_node/Java-Bison-Interface.html
You can use it go generate Java code.
There is also jacc.
Jacc is about as close to yacc as you can get, but it is implemented in pure java and generates a java parser.
It interfaces well with jFlex
http://web.cecs.pdx.edu/~mpj/jacc/
Another option would be the GOLD Parser.
Unlike many of the alternatives, the GOLD parser generates the parsing tables from the grammar and places them in a binary, non-executable file. Each supported language then has an engine which reads the binary tables and parses your source file.
I've not used the Java implementation specifically, but have used the Delphi engine with fairly good results.
I'm developing a small programming language based mostly of the C99 standard and I've already written a fairly decent lexer in java and now I'm looking to generate a Java Parser from the grammar. I know there's Bison, but that seems to only generate C code. I'm looking for a application that will allow me to input my grammar and create a full parser class in java code. Reading other SO posts on related topics, I've found ANTLR, but I'm wondering if anyone in the SO knows about a better tool?
thanks!
Another couple to look at are JavaCC and SableCC (it has been a long time since I looked at SableCC).
I've been quite impressed by BNFC, which is able to generate parsers in Java as well as in C, C++, C#, F#, Haskell, and OCaml.
The JFlex home page at http://jflex.de indicates where to find Bison-like tools that can target Java:
http://byaccj.sourceforge.net/
http://www2.cs.tum.edu/projects/cup/
http://www.antlr.org/