Transforming .mm file to readably form by java

Transforming .mm file to readably form by java - java

I am developing Multi-mode resource-constrain project scheduling solver in Java. I was looking for test instances but only I found this. It is in .mm file that is extension for C++ compilator. Is there any way how to transform this data into something easy readable by java like XML, JSON?

As suggested you could of course parse the file as a text file. Alternatively the two other main approaches would be:
Use clang/llvm's active syntax tree (AST) to interpret the data in the file.
Use an Objective-C++ grammar for a compiler generator like yacc or, since you're using Java, JavaCC. This will also yield a syntax tree, that you can that walk and extract information from.

Related

Which tool is proper for extracting information from ANTLR outputs?

I am new to ANTLR. I parsed a grammar in antlr and got lexer.java and parser.java files. I test it with simple example and it showed proper tree in inerpreter tab and pars tree in debugger tab Now I want to extract spesific information from it. I would like to know if I need ast or not and is there any tool which is compatible with ANTLR for extracting data?
Thanks.

According to the ANTLR4 Wiki, and ANTLR-generated parser generates a parse tree data structure, and it provides a tree walker class that you can use for traversing it. You could use this mechanism to extract information. Note that you'd need to code a "listener" class in Java to extract the information and output it (or whatever).
For more details; see https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Parse+Tree+Listeners
UPDATE
Since you are using ANTLR3, these links are more relevant to you:
Tree pattern matching.
Tree Construction.
FAQ: Tree construction.
I strongly recommend that you take the time to read the available documentation.

What is the best way to parse a non-flat file format in Java?

I am attempting to parse a nested file format in Java.
The file format looks like this:
head [
A [
property value
property2 value
property3 [
... down the rabbit hole ...
]
]
... more As ...
B [
.. just the same as A
]
... more Bs ...
]
What is the best/easiest technique to parse this into my program?
Finite State Machine?
Manually read it word by word and keep track of what part of the structure I am in?
Write a grammar...?
As a side note, I have no control over the format - because I knew someone would say it!

If the grammar is indeed nested like this, writing a very simple top-down parser would be a trivial task: you have very few tokens to recognize, and the nested structure repeats itself very conveniently for a textbook recursive-descent parser.
I would not even bother with ANTLR or another parser generator for something this simple, because the learning curve would eat the potential benefits for the project* .
* Potential benefits for you from learning a parser generator are hard to overestimate: if you can spend a day or two learning to build parsers with ANTLR, your view of structured text files will change forever.

I second the recommendation to take a look at Antlr. StAX adds SAX-like event handling.
http://www.antlr.org/wiki/display/ANTLR3/Interfacing+StAX+to+ANTLR
Yes there is a learning curve, but by the time you handled all the odd cases and debugged your code, you'd probably break even -- pluse you'd have a new item on your resume.

Arguably the easiest way to parse files of these kinds is using a recursive descent parser (http://en.m.wikipedia.org/wiki/Recursive_descent_parser). I guess this is what you mean by manually reading and keeping track of the structure you have found.
A finite state machine wouldn't work if you have to be able to deal with unlimited nesting. If there are only two levels it could be enough.
Writing a grammar and generating a parser would also work, but if you haven't done that before or don't have the time to learn how to use the tools it's probably overkill...

The fastest approach is to use a format like this already e.g. JSon or YAML. These formats do this and are supported.
As a side note, I have no control over the format
If you want to know the best way to parse something like Yaml, but not, is to read the code for a simple Yaml parser.
Just parsing the file is unlikely to be enough, you will also want to trigger events or generate a data model from the data you load.

Parsing / reading C-Header files using Java

I have a C-Header file defining a couple of stucts, containing multiple char arrays.
I'd like to parse these files using Java. Is there a library for reading C-Header files either into a structure or is there a stream parser that understands C-Header files?
Just for more background (I'm just looking for a C-Header parser, not a solution for this particular problem):
I have a text file containing data and a C-Header file explaining the structure. Both are a bit dynamic, so I don't want to generate Java class files.
example:
#define TYPE1
typedef struct type1
{
char name1[10];
char name2[5];
}
#endif
Type2, Type3 etc are similar.
Data structure:
type1ffffffffffaaaaa

You can use an existing C parser for Java. It does a lot more than parsing header files, of course, but that shouldn't hurt you.
We use the parser from the Eclipse CDT project. This is an Eclipse plugin, but we sucessfully use it outside of Eclipse, we just have to bundle 3 JAR files of Eclipse with the parser JAR.
To use the CDT parser, start with an implementation of org.eclipse.cdt.core.model.ILanguage, for example org.eclipse.cdt.core.dom.ast.gnu.c.GCCLanguage. You can call getTranslationUnit on it, passing the code and some helper stuff. A code file is represented by a org.eclipse.cdt.core.parser.FileContent instance (at least in CDT7, this seems to change a lot). The easiest way to create such an object is FileContent.createForExternalFileLocation(filename) or FileContent.create(filename, content). This way you don't need to care about the Eclipse IFile stuff, which seems to work only within projects and workspaces.
The IASTTranslationUnit you get back represents the whole AST of the file. All the nodes therein are instances of IASTSomething types, for example IASTDeclaration etc. You can implement your own subclass of org.eclipse.cdt.core.dom.ast.ASTVisitor to iterate through the AST using the visitor pattern. If you need further help, just ask.
The JAR files we use are org.eclipse.cdt.core.jar, org.eclipse.core.resources.jar, org.eclipse.equinox.common.jar, and org.eclipse.osgi.jar.
Edit: I had found a paper which contains source code snippets for this:
"Using the Eclipse C/C++ Development Tooling as a Robust, Fully Functional, Actively Maintained, Open Source C++ Parser", but it is no longer available online (only as a shortened version).

Example using Eclipse CDT with only 2 jars.
- https://github.com/ricardojlrufino/eclipse-cdt-standalone-astparser
In the example has a class that displays the structure of the source file as a tree and another example making interactions on the api ...
A detail is that with this api(Eclipse CDT Parser) you can do the parsing from a string in memory.
Another example of usage is:
https://github.com/ricardojlrufino/cplus-libparser
Library for metadata extraction (information about classes, methods, variables) of source code in C / C ++.
See file:
https://github.com/ricardojlrufino/cplus-libparser/blob/master/src/main/java/br/com/criativasoft/cpluslibparser/SourceParser.java

As mentioned already, CDT is perfect for this task. But unlike described above I used it from within a plugin and was able to use IFiles. Then everything is so mouch easier. To get the "ITranslationUnit" just do:
ITranslationUnit tu = (ITranslationUnit) CoreModel.getDefault().create(myIFile);
IASTTranslationUnit ias = tu.getAST();
I was i.e. looking for a special #define, so I could just:
ppc = ias.getAllPreprocessorStatements();
To get all the preprocessed code statements, each statement in array-element. Perfectly easy.

You can try to use ANTLR. There should be already some existing C grammar available for it.

Generate HTML from plain text using Java

I have to convert a .log file into a nice and pretty HTML file with tables. Right now I just want to get the HTML header down. My current method is to println to file every single line of the HTML file. for example
p.println("<html>");
p.println("<script>");
etc. there has to be a simpler way right?

How about using a JSP scriplet and JSTL?, you could create some custom object which holds all the important information and display it formatted using the Expression Language.

Printing raw HTML text as strings is probably the "easiest" (most straightforward) way to do what you're asking but it has its drawbacks (e.g. properly escaping the content text).
You could use the DOM (e.g. Document et al) interface provided by Java but that would hardly be "easy". Perhaps there are "DOM builder" type tools/libraries for Java that would simplify this task for you; I suggest looking at dom4j.

Look at this Java HTML Generator library (easy to use). It should make generating the actual HTML muuuch clearer. There are complications when creating HTML with Java Strings (what happens if you want to change something like a rowspan?) that can be avoided with this library. Especially when dealing with tables.

There are many templating engines available. Have a look at https://stackoverflow.com/questions/174204/suggestions-for-a-java-based-templating-engine
This way you can define a template in a txt file and have the java code fill in the variables.

How to write LALR parser for some grammar in Java?

I want to write Java code to build a LALR parser for my grammar. Can someone please suggest some books or some links where I can learn how to write Java code for a LALR parser?

Writing a LALR parser by hand is difficult, but it can he done. If you want to learn the theory behind constructing parsers for them by hand, consider looking into "Parsing Techniques: A Practical Guide" by Grune and Jacobs. It's an excellent book on general parsing techniques, and the chapter on LR parsing is particularly good.
If you're more interested in just getting a LALR parser that is written in Java, consider looking into Java CUP, which is a general purpose parser generator for Java.
Hope this helps!

You can split the LALR functionality in two parts: preparation of the tables and parsing the input.
The first part is complex and errorprone, so even if you like knowing how it works I suggest to use a proven working table generator for the LALR states (and for the tokenizer DFA as well).
The second part consists of consuming those tables using some quite simple algorithms to tokenize and process the input into a parse tree/concrete syntax tree. This is easier to implement yourself if you like to do so, and you still have full control over how it works and what it does.
When doing parsing tasks, I personally use the free GOLD Parsing System, which has a nice UI for creating and debugging the grammar and it does also generate table files which can then be loaded and processed by an existing engine or your own implementation (the file format for these CGT files is well documented).

As previously stated, you would always use a parser-generator to produce an LALAR parser. A few such tools for Java are:
SableCC (my personal favourite)
CUP
Beaver3
SJPT
Gold

Just want to mention that my project CookCC ( http://coconut2015.github.io/cookcc/ ) is a LALR(1) parser + Lexer (much like flex).
The unique feature of CookCC is that you can write your lexer and parser in Java using Java annotations. See the calculator example here: https://github.com/coconut2015/cookcc/blob/master/tests/javaap/calc/Calculator.java

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.