Programming an Interpreter for a Compiler - java

I'm writing an interpreter for a compiler program in Java. So after checking the source code, syntax and semantics, I want to be able to run the source code, which is the input for my compiler. I'm just wondering if I can just translate some tokens, for example, out (it prints stuff on screen), can I just replace it with System.out.print? then feed the source code again to run it in java?
I've heard of using the Java Compiler API, would this be a good plan?
Thank you very much in advance!

What you asking is a virtual machine implementation technique, to run your Java code in general you should implement following:
The first few steps I guess you already done (Design/describe the language semantics, construct AST and perform required validation of the code)
You need to generate your byte code, original Java works exactly in the same way, it generates another representation of the source code, from human readable to machine readable.
Here you can see how Java byte code looks like http://www.ibm.com/developerworks/ibm/library/it-haggar_bytecode/
You need to implement virtual aka stack machine that reads byte code and runs it for execution.
So as you can see you should have 3 separated components (projects) for your task:
1. Language grammar
2. Compiler (byte code generator)
3. Virtual machine (interpreter of byte code)
P.S. I have experience in creation of tiny Java similar compiler from scratch (define grammar with ANTlr, implementation of compiler, implementation of virtual machine), so probably I can share more information with you (even source code) if you need something particular

You really need to read some books and/or take courses on compilers - this can't be solved by a two-paragraph answer on SO.

You could create a cross-compiler which reads your language and outputs Java code to do the same thing. This may be the simplest option.
The Java Compiler API can be used to compile Java code. You would need to translate your existing code to Java first to use it.
This would not be the same thing as writing an interpreter. Is this homework? Does the task say you have to write the interpreter or can you have the code run any way which works?

Unfortunately you did not mention which scripting language are you planning to support. If it is one of well known languages, just use its ready interpreter written in pure java. See BSF and Java 5 scripting (http://www.ibm.com/developerworks/java/library/j-javascripting1/)
It it is your own language
think twice: do you really need it?
If you are sure you need your own language think about JavaCC

First of all, thank you very much for the fast replies.
As part of our compiler project, we need to be able to compile and run a program written in our own specified language. The language is very similar to C. I am confused on how an interpreter works, is there a simpler way to implement this? Without generating byte codes? My idea was to translate each statement into Java equivalent statements, and let Java handle the byte code generation.
I would look into the topics mentioned. Again, thank you very much for the suggestions.

Related

Writing languages for the JVM

Suppose I write a programming language; for namesake, I'll call it lang.
To begin the long journey of writing lang, I decide to begin, by writing lang in itself. I can't actually run it, because theres nothing to run the program that runs itself.
So I begin by writing another compiler for lang in Java. This time, when I am done, I decide to convert it to Bytecode, and leave it at that. I now have a working compiler, which will convert all my lang code into Bytecode.
So I decide to plug in my self-compiler for the language, into the compiler I just made in Java. I then convert the self-compiler to Bytecode, and chuck out the Java compiler. I now have a lang compiler, purely written in itself, converted into Bytecode, ready for use.
This creates a solid program, and I understand all of this, but my question is, relative to compiler design for the JVM, what if I decide to release an update for my language? How do I go about updating the Bytecode? Do I simply re-write the updated version of the language in the older one?
I ask this because this is what I want to do. Write a non-existing language in itself, and then bootstrap it to the JVM by firstly creating a compiler in Java.
It's the same as what was done with C++. C with Classes was written, and then C++ in it, and finally C with Classes was abandoned for the bootstrapped C++. But then how on earth did they ever go about updating the language?
I'll answer this from two possible scenarios in your development. With any byte-code language at any time you can update the virtual machine or the language.
Suppose first you wanted to update your language to have new syntax or change the current semantics. Then you'd keep your current compiled compiler written in lang (compiler A) and edit its source so that it can correctly compile your new features. Then you compile your compiler using the old one giving you compiler B. If necessary, you can now rewrite the compiler to use the new features and then compile it using compiler B to give you compiler C.
What if the JVM changes? Well in that case you keep an old version of the JVM around, adjust your compiler to cope with the new bytecode changes, and then compile it with the old one (this is analogous to compiler B from before). That will get you a compiler that compiles to the new bytecode but runs on the old VM. The next step is get it to compile itself, and now you have a new compiler that runs on the new VM (analogous to compiler C).
I don't think your compiler is the best way to go about this.
I'd start with a grammar for my language.
Next comes the lexer/parser to turn expressions in my language to an abstract syntax tree (AST). The AST is a correct intermediate representation of the expression.
You would emit bytecode or assembly language instructions for the virtual machine or processor of your choice by writing a code generator that traverses the AST.
Where does your update happen?
If it's language fundamentals, you have to modify both the grammar and the bytecode emission.
If you're optimizing the bytecode or porting to a new processor you have to modify the code generator.
The first lang compiler can be written in a subset of lang. And you only need a subset (bootstrap) compiler (or even interoreter). This can be written in java.
Later, more extensive compilers can be written in lang. Newer versions can do too.
You could even write a translator that converts a lang program to java, and use that to create a first translator in lang, and then turn it into a bytecode compiler.

Java to C/C++, or at least get the Java converted code

Are there any Java -> C/C++ Converters? Well I expect a no.
But I know Java works by converting the Java Byte Code, into code that the OS can understand using JIT. So is there any way to get this "converted code"?
Thanks.
Thanks to Baltasarq, who set me on the right course, I starting looking for Ahead of Time compilers, Amazingly, I found GCJ which is included in GCC (I think the latest?). It does exactly what I want. Take a Java file, turn it into an EXE. But, it needs 44 DLLS for a simple print "Hello World" app. Oh well :D
But I know Java works by converting the Java Byte Code, into code that the OS can understand using JIT. So is there any way to get this "converted code"?
You're talking about compiling code "ahead of time", or at least that's the name it receives in the Mono project (free implementation of .NET/C#). If you are interested on this, you could convert your code from Java to C# (which is at least easier than C++), and then take advantage of this feature. There is even a tool dedicated to this purpose: mkbundle.
I do not know if there is a way to get the JIT code issued ~ it is a runtime conversion that is done after like 1000 or 10,000 through some part of the code path - I am sure one can get this "converted code"! but what you are talking about is source-code and anyway the JVM runs on the byte code for 1000 or 10,000 before it even tries to bring int the JIT or can be made to run on byte code only
seems to me without knowing where your needs are I would think it faster to write the code by hand ~ if I need a short loop or something to explain something to some bode I can do it faster by hand than finding some tool to do it
there actually is a dumper that comes as part of a standard install ~ and I know it works because I have some code saved in source file comments where it did it
Are there any Java -> C/C++ Converters? Well I expect a no.
Well, there are some projects, but none I know of that are production-quality. Mostly they translate to C, as the additional features that C++ offers do not align very well with what Java does. A Google search will give you quite a few projects. Also see this question:
Does a Java to C++ converter/tool exist?
But I know Java works by converting the Java Byte Code, into code that
the OS can understand using JIT. So is there any way to get this
"converted code"?
No, not that I am aware of. You could theoretically take the HotSpot source code (it's available as part of OpenJDK), and insert logging statements to dump the generated machine code. I don't know if anyone has done that yet.

Is it possible to automatically convert java code to PHP?

Is there are tools for converting java code to php? I have source code of java library and I need it to convert to php.
It is possible to automatically convert it. This is called a source to source compiler. Normally when you compile software, the parser will build an abstract syntax tree and convert this into the target machine language code. But it is just as possible to have a compiler convert this into another high level (compilable) language.
Java is a strongly typed language, and PHP is not, so source to source compilers are rare and the code conversion process is incomplete. However this said, there is a reasonably good one with a free demo at: http://javatophp.com
Automatically - No. Now. Maybe in future. Don't spend time, write new code bro.
I don't think there is a solution like this currently.
You might try using a php-java bridge that would allow you to call the java code from within PHP:
http://php-java-bridge.sourceforge.net/pjb/
Zend Server also provides a bridge
Team of 5 folks at Facebook have spent 18 month to write sofrware that converts PHP to C++ (meet: HipHop). There is no such software for transforming from Java to PHP yet.
The answer is: yes... it is possible if you have year and a half and team of pro programmers :)
Otherwise, you rewrite it manually (I think, this is your choise).
There are lots of aspects of Java that cannot be expressed in PHP. Type safety for one. This sounds like a fool's errand to me. If you were looking to go in the opposite direction the question might have some interest.

byte code, libraries and Java

If I wanted to create a new language for Java I should make a compiler that is able to generate the byte-code compatible with the JVM spec, right? and also for the JDK libraries?
Where can I find some info?
Thanks.
Depends what you mean by "create a new language for Java" -- do you mean a language that compiles to bytecode and the code it generates can be used from any Java app (e.g. Groovy) or an interpreted language (for which you want to write a parser in Java)?
If it is the former one then #Joachim is right, look at the JVM spec; for the latter look at the likes of JavaCC for creating a parser for your language grammar.
I would start with a compiler which produced Java source. You may find this easier to read/understand/debug. Later you can optimise it to produce byte code.
EDIT:
If you have features which cannot be easily translated to Java code, you should be able to create a small number of byte code classes using Jasmin with all the exotic functionality which you can test to death. From the generated Java code this will look like a plain method call. The JVM can still inline the method so this might not impact performance at all.
The Java Virtual Machine Spec should have most of what you need.
An excellent library for bytecode generation/manipulation is ASM: http://asm.ow2.org.
It is very versatile and flexible. Note however that it's API is based on events (similar to Sax parsers) - it reads .class files and invokes a method whenever it encounters a new entity (class declaration, method declaration, statements, etc.). This may seem a bit awkward at first, but it saves a lot of memory (compared to the alternative: the library read the input, spits out a fully-evolved tree structure and then you have to iterate over it).
I don't think this will help much practically, but it has a lot of sweet theoretical stuff that I think you'll find useful.
http://www.codeproject.com/KB/recipes/B32Machine1.aspx

Are there any examples of code that is difficult to decompile?

Sometimes when decompiling Java code, the decompiler doesn't manage to decompile it properly and you end up with little bits of bytecode in the output.
What are the weaknesses of decompilers? Are there any examples of Java source code that compiles into difficult-to-decompile bytecode?
Update:
Note that I'm aware that exploiting this information is not a safe way to hide secrets in code, and that decompilers can be improved in the future.
Nonetheless I am still interested in finding out what kinds of code foxes todays crop of decompilers.
Any Java byte code that's been through an obfuscator will have "ridiculous" output from the decompiler. Also, when you have other languages like Scala that compile to JVM byte code, there's no rule that the byte code be easily represented back in Java, and likely isn't.
Over time, decompilers have to keep up with the new language features and the byte code they produce, so it's plausible that new language features are not easily reversed by the tools you're using.
Edit: As an example in .NET, the following code:
lock (this)
{
DoSomething();
}
compiles to this:
Monitor.Enter(this);
try
{
DoSomething();
}
catch
{
Monitor.Exit(this);
}
The decompiler has to know that C# (as opposed to any other .NET language) has a special syntax dedicated to exactly those two calls. Otherwise you get unexpected (verbose) results.
The JDBC type-4 drivers for DB2 Connect are classics. Everything called one or two-letter names, irrelevant code that ends up having no effect, and more. I once tried to take a look to debug a particularly annoying problem and basically gave up. I'm hoping (but by no means confident) that this was passed through an obfuscator rather than the code actually looking like that.
Another favorite trick (although I can't remember the product) was to rename all objects to be constructed from the set {'0','O','l','1'}, which made reading it very difficult.
Assuming you can decompile back to a reasonable style of source code (you can't always do that), what is hard to "reverse engineer" are algorithms that operate in unfamiliar problem domains. If you don't understand Fast Fourier transforms, it doesn't matter much if you can get back the code that implements an FFT Butterfly.
(If this phrase is unfamiliar to you, I've already won if I encode one. If it is familiar to you, you are a pretty good engineer and probably don't have any interest in reverse engineering code). [Your mileage with North Koreans may vary.]
Java keeps a lot of information in the bytecode (for instance many names). So it is relatively easy to decompile. Hard to decompile bytecode mostly is generated by hard to read sourcecode (so that's not really an option). If you really want to obfuscate your code, use a obfuscator, that renames all methods and variables to unrecognizable stuff.
Exceptions are often difficult to decompile.
However, any code which has been obfuscated or has been written in another language is difficult to decompile.
BTW: Why would you want to know this?
Java Bytecode does not correspond directly to Java constructions, so decompiling implies that you know that a certain java byte code sequence corresponds to a Java code construction.
The Soot framework for decompiling java byte code has a lot of information on this, but their webpage is down for me right now.
http://www.sable.mcgill.ca/soot/

Categories

Resources