how to count all Operators and Operands in java class file? Does anyone have an idea?
Doing this kind of thing using regexes is unreliable. The syntax of Java is sufficiently complex that there are bound to be tricky corner cases that will cause your regexes to miscount.
Similarly using a bytecode analyser is liable to give you incorrect results because there isn't necessarily a one-to-one correspondence between source code operators / operands and bytecode instructions. The Java compiler may reorganize and rewrite the code in non-obvious ways.
The best way to do this sort of thing is to find a decent Java AST library, use that to parse your source code, and then traverse the AST to extract the information you need. (In this case, you need to count the operator and operand nodes.)
Forget regex (you'll never get that right without getting false positives like operators in comments etc), you're going to have to run a visitor over your code that counts operators. Now you can either use a source code parser or a byte code parser to do that.
For source code parsing I'd suggest the javaparser project. There, you'd create a custom Visitor extending VoidVisitorAdapter and overriding several relevant methods like this:
public void visit(AssignExpr n, A arg) {
// track the operator here
super.visit(n, arg); // resume visitor
}
On the byte code side, you'd probably use ASM and extend ClassAdapter to create your visitor. Both versions should work equally well. Or maybe not, as Stephen C writes (the compiler may have added or removed some operations).
You could try to analyze the bytecode of your class using a library like bcel.
Or use the sourceforge project lachesis (I haven't tried it):
Lachesis Analysis is a Software Complexity Measurement program for Object-Oriented source code. Analysis for Java source code and Java byte-code only is currently available.
Related
I'm trying to make a parsing library for JDK 11.x that reads Haskell code as input, then translates it into Java to be executed by the JVM. I'm calling it Jaskell, but I need to know Haskell's formal grammar structure in order to determine what type of parser Jaskell needs to be (i.e. LL or LR parser).
I need to know Haskell's formal grammar structure
Haskell's grammar is context sensitive due to significant indentation. In both the lexical and the context-free syntax there are some ambiguities that are to be resolved by the longest match ("maximal munch") rule.
Differences between the syntax specification and implementations do exist, supposedly none of them precisely implements the spec. Also there are a whole lot of language extensions.
There's a ready-made scanner definition out there to scan Haskell with an older version of Antlr, but still no parsing grammar. Guess you'd have to grab the parsers from aforementioned projects (Frege, Eta) or the ghc itself.
Looking at the source tree and the descriptions of ghc-lib-parser package, we learn thanks to this post that GHC’s parser was produced by the happy parser generator, generating a LALR(1) parser. This means that Haskell’s grammar is unambiguous , not requiring happy’s GLR-generating abilities.
make a parsing library that reads Haskell code ... then translates it into Java
You want to create a Haskell-Java cross compiler. Just looking at the discrepancy of the type systems I'd say, that this is pretty far fetched. One can squeeze Haskell into JVM bytecode, but I can't imagine, that any kind of generated Java might be very useful. On the other hand Haskell compiles internally to something called "core language", a typed lambda calculus basically. That might be a more straight forward starting point.
If you're interested, there's a Haskell-Javascript cross compiler (based on ghc), and there's a javascript-java cross compiler. Problem solved! Or isn't it?
Instead of reading Haskell programs directly, you may try to read the Core or STG intermediate programs that GHC can output for the Haskell programs.
Edit: I have rewritten the question to hopefully make it more understandable.
I do not want to overload!
If you have the following code:
ImmutableObject mutableReference = new ImuttableObject();
mutableReference = mutableReference.doStuff(args);
Can a compile time or pre-compile time process replace defined text formats? For example:
DEFINE X.=Y AS X = X.Y
could replace
mutableReference .= doStuff(args) with mutableReference = mutableReference.doStuff(args);
So some process knows that the code before ".=" is X and after is Y. Similar to syntactic sugar, before compiling or during, just replace X.=Y with X = X.Y.
Below is the old version of the question.
I have the following "form" of code for lack of a better word.
turnStates = turnStates.add(currentState); // log end of turn state.
//turnStates.=add(currentState);
//turnStates=.add(currentState);
Where turnStates can be a reference to any immutable object.
I would like it to look like the code commented out or similar.
Much like integers that have ++ and += I'd like a way to write my own for my immutables.
I think I recall some pre-processor stuff from C++ that I think could replace predefined text for code snippets. I was wondering if there was a way in java to define a process for replacing my desired code for the working code at compile time.
I'm sure you could make the IDE do it, but then you can't share the code with others not running a pre-configured IDE.
Edit:
turnStates is immutable and returns a different object on a call to add. It is test code and I have my reasons why a list, or as it is at the moment acting more like a stack, is immutable. Irrelevant for the question as I could simply replace it with
player = player.doSomething(args) where doSomething(args) returns a Player instance. Player is just a small part of the model and is costless to be immutable.
I know Overloads and syntax can't be changed in Java. As I tried to portray originally, sorry if it didn't come across this way is:
I was hoping that I wasn't aware of a syntax to do with maybe the # sign that could replace text before compiling. So for example:
DEFINE X.=Y AS X = X.Y where X = turnStates and Y = add() in my example.
But as the answer I upvoted said. I'll check out Scala as the answer seems to be no.
No. Java explicitly does not support operator overloading for user defined data types. However, scala is a JVM hosted language and does.
Unlike C++,Java doesn't support operator overloading.But Scala or Groovy does.
Scala can be integrated into Java but the operator overloading integration part is still not directly supported by Java as you will not be able to use the operator itself but something like #eq(...) for the "=" operator.
Check this link out for a little more detail if you want to know about Scala integration into java
Bottom line:
operator overloading is not supported by Java
And if your project requires a lot of vector addition, substraction,etc. i.e. lot of custom operators then a good suggestion would be using C# as your choice of language which is a Java like language
I've got a bit of an interesting challenge
To the point:
I want to allow a user to enter an expression in a text field, and have that string treated as a python expression. There are a number of local variables I would like to make available to this expression.
I do have a solution though it will be cumbersome to implement. I was thinking of keeping a Python class source file, with a function that has a single %s in it. When the user enters his expression, we simply do a string format, and then call Jython's interpreter, to spit out something we can execute. There would have to be a number of variable declaration statements in front of that expression to make sure the variables we want to expose to the user for his expression.
So the user would be presented with a text field, he would enter
x1 + (3.5*x2) ** x3
and we would do our interpreting process to come up with an open delegate object. We then punch the values into this object from a map, and call execute, to get the result of the expression.
Any objections to using Jython, or should I be doing something other than modifying source code? I would like to think that some kind of mutable object akin to C#'s Expression object, where we could do something like
PythonExpression expr = new PythonExpression(userSuppliedText)
expr.setDefaultNamespace();
expr.loadLibraries("numPy", /*other libraries?*/);
//comes from somewhere else in the flow, but effectively we get
Map<String, Double> symbolValuesByName = new HashMap<>(){{
put("x1", 3.0);
put("x2", 20.0);
put("x3", 2.0);
}};
expr.loadSymbols(symbolValuesByName);
Runnable exprDelegate = expr.compile();
//sometime later
exprDelegate.run();
but, I'm hoping for a lot, and it looks like Jython is as good as it gets. Still, modifying source files and then passing them to an interpreter seems really heavy-handed.
Does that sound like a good approach? Do you guys have any other libraries you'd suggest?
Update: NumPy does not work with Jython
I should've discovered this one on my own.
So now my question shifts: Is there any way that from a single JVM process instance (meaning, without ever having to fork) I can compile and run some Python code?
If you simply want to parse the expressions, you ought to be able to put something together with a Java parser generator.
If you want to parse, error check and evaluate the expressions, then you will need a substantial subset of the functionality a full Python interpreter.
I'm not aware of a subset implementation.
If such a subset implementation exists, it is unclear that it would be any easier to embed / call than to use a full Python interpreter ... like Jython.
If the powers that be dictate that "thou shalt use python", then they need to pay for the extra work it is going to cause you ... and the next guy who is going to need to maintain a hybrid system across changes in requirements, and updates to the Java and Python / Jython ecosystems. Factor it into the project estimates.
The other approach would be to parse the full python expression grammar, but limit what your evalutor can handle ... based on what it actually required, and what is implementable in your project's time-frame. Limit the types supported and the operations on the types. Limit the built-in functions supported. Etcetera.
Assuming that you go down the Java calling Jython route, there is a lot of material on how to implement it here: http://www.jython.org/jythonbook/en/1.0/JythonAndJavaIntegration.html
In certain problem I need to parse a Java source code fragment that is potentially incomplete. For example, the code can refer to variables that are not defined in such fragment.
In that case, I would still like to parse such incomplete Java code, transform it to a convenient inspectable representation, and being able to generate source code from such abstract representation.
What is the right tool for this ? In this post I found suggestions to use Antlr, JavaCC or the Eclipse JDT.
However, I did not find any reference regarding dealing with incomplete Java source code fragments, hence this question (and in addition the linked question is more than two years old, so I am wondering if something new is on the map).
As an example, the code could be something like the following expression:
"myMethod(aVarName)"
In that case, I would like to be able to somehow detect that the variable aVarName is referenced in the code.
Uhm... This question does not have anything even vaguely like a simple answer. Any of the above parser technologies will allow you to do what you wish to do, if you write the correct grammar and manipulate the parser to do fallback parsing unknown token passover sort of things.
The least amount of work to get you where you're going is either to use ANTLR which has resumable parsing and comes with a reasonably complete java 7 grammar, or see what you can pull out of the eclipse JDT ( which is used for doing the error and intention notations and syntax highlighting in the eclipse IDE. )
Note that none of this stuff is easy -- you're writing klocs, not just importing a class and telling it to go.
At a certain point of incorrect/incompleteness all of these strategies will fail just because no computer ( or even person for that matter ) is able to discern what you mean unless you at least vaguely say it correctly.
Eclipse contains just that: a compiler that can cope with incomplete java code (basically, that was one reason for these guys to implement an own java-compiler. (See here for better explanation)
There are several tutorials that explain the ASTParser, here is one.
If you just want basic parsing - an undecorated AST - you can use existing Java parsers. But from your question I understand you're interested in deeper inspection of the partial code. First, be aware the problem you are trying to solve is far from simple, especially because partial code introduces a lot of ambiguities.
But there is an existing solution - I needed to solve a similar problem, and found that a nice fellow called Barthélémy Dagenais has worked on it, producing a paper and a pair of open-source tools - one based on Soot and the other (which is generally preferable) on Eclipse. I have used both and they work, though they have their own limitations - don't expect miracles.
Here's a direct link to a quick tutorial on how to start with the Eclipse-based tool.
I needed to solve a similar problem in my recent work. I have tried many tools, including Eclipse JDT ASTParser, python javalang and PPA. I'd like to share my experience. To sum up, they all can parse code fragment to some extent, but all failed to parse occasionally when the code fragment is too ambiguous.
Eclipse JDT ASTParser
Eclipse JDT ASTParser is the most powerful and widely-used tool. This is a code snippet to parse the method invocation node.
ASTParser parser = ASTParser.newParser(AST.JLS8);
parser.setResolveBindings(true);
parser.setKind(ASTParser.K_STATEMENTS);
parser.setBindingsRecovery(true);
Map options = JavaCore.getOptions();
parser.setCompilerOptions(options);
parser.setUnitName("test");
String src = "System.out.println(\"test\");";
String[] sources = { };
String[] classpath = {"C:/Users/chenzhi/AppData/Local/Programs/Java/jdk1.8.0_131"};
parser.setEnvironment(classpath, sources, new String[] { }, true);
parser.setSource(src.toCharArray());
final Block block = (Block) parser.createAST(null);
block.accept(new ASTVisitor() {
public boolean visit(MethodInvocation node) {
System.out.println(node);
return false;
}
});
You should pay attention to parser.setKind(ASTParser.K_STATEMENTS), this is setting the kind of constructs to be parsed from the source. ASTParser defines four kind (K_COMPILATION_UNIT, K_CLASS_BODY_DECLARATIONS, K_EXPRESSION, K_STATEMENTS), you can see this javadoc to understand the difference between them.
javalang
javalang is a simple python library. This is a code snippet to parse the method invocation node.
src = 'System.out.println("test");'
tokens = javalang.tokenizer.tokenize(code2)
parser = javalang.parser.Parser(tokens)
try:
ast = parser.parse_expression()
if type(ast) is javalang.tree.MethodInvocation:
print(ast)
except javalang.parser.JavaSyntaxError as err:
print("wrong syntax", err)
Pay attention to ast = parser.parse_expression(), just like the parser.setKind() function in Eclipse JDT ASTParser, you should set the proper parsing function or you will get the 'javalang.parser.JavaSyntaxError' exception. You can read the source code to figure out which function you should use.
PPA
Partial Program Analysis for Java (PPA) is a static analysis framework that transforms the source code of an incomplete Java program into a typed Abstract Syntax Tree. As #Oak said, this tool came from academy.
PPA comes as a set of Eclipse plug-ins which means it need to run with Eclipse. It has provided a headless way to run without displaying the Eclipse GUI or requiring any user input, but it is too heavy.
String src = "System.out.println(\"test\");";
ASTNode node = PPAUtil.getSnippet(src, new PPAOptions(), false);
// Walk through the compilation unit.
node.accept(new ASTVisitor() {
public boolean visit(MethodInvocation node) {
System.out.println(node);
return false;
}
});
Suppose I want to add minor syntactic sugars to Java. Just little things like adding regex pattern literals, or perhaps base-2 literals, or multiline strings, etc. Nothing major grammatically (at least for now).
How would one go about doing this?
Do I need to extend the bytecode compiler? (Is that possible?)
Can I write Eclipse plugins to do simple source code transforms before feeding it to the standard Java compiler?
I would take a look at Project Lombok and try to reuse the attempt they take. They use Java 5 annotations to hook in a Java agent which can manipulate the abstract syntax tree before the code is compiled. They are currently working on creating an API to allow custom transformers to be written which can be used with javac, or the major IDEs such as Eclipse and NetBeans. As well as annotations which trigger code to be generated, they are also planning on adding syntax changes (possibly mixin or pre-Java 7 closure syntax).
(I may have some of the details slightly off, but I think I'm pretty close).
Lombok is open source so studying their code and trying to build on that would probably be a good start.
Failing that, you could attempt to change the javac compiler. Though from what I've heard that's likely to be a hair-pulling exercise in frustration for anyone who is not a compiler and Java expert.
You can hack javac with JSR 269 (pluggable annotation processing) notably. You can hook into the visitor that traverse the statements in the source code and transform it.
Here is for instance the core of a transformation to add support for roman number in java (read of course the complete post for more details). It seems relatively easy.
public class Transform extends TreeTranslator {
#Override
public void visitIdent(JCIdent tree) {
String name = tree.getName().toString();
if (isRoman(name)) {
result = make.Literal(numberize(name));
result.pos = tree.pos;
} else {
super.visitIdent(tree);
}
}
}
Here are additional resources:
Hacker's guide to the java compiler
Javac hacker resources
I don't know if project Lombok (cited in the other answer) uses the same technique, but I guess yes.
Charles Nutter, the tech lead of JRuby, extended Javac with literal regular expressions. He had to change about 3 lines of case, as far I recall.
See http://twitter.com/headius/status/1319031705
And here is an awesome tutorial on how to add a new operator to javac, http://www.ahristov.com/tutorial/java-compiler.html
For more links like that, see my list of Links for javac hackers .
Charles Nutter, the tech lead of JRuby, extended Javac with literal regular expressions. He had to change about 3 lines of case, as far I recall.
See http://twitter.com/headius/status/1319031705