ANTLR 4 and StringTemplate 4 - using tree walker with templates

ANTLR 4 and StringTemplate 4 - using tree walker with templates - java

Disclaimer: I never used Java before last month, and I had never heard of ANTLR or StringTemplate before then either. For my internship this summer I was given a project using tools that nobody else at the company has ever used. Everyone "has faith in me" that I will "figure it out." Hence the huge gaps in my understanding. I love this project and I've learned a ton, so don't take this as complaining. I just want to make it work.
Right now I'm working on a pretty printer proof of concept for an old domain-specific language. My ANTLR grammar is producing nice parse trees, and I'm able to output simple StringTemplate examples like the ones in the introduction.
Say I have an simple template in my .stg file:
module(type, name, content) ::= "<type> MODULE <name>; <content>; END MODULE."
In Java I'm able to use add() to set the values for each of the template arguments:
STGroup g = new STGroupFile("example.stg");
ST st = g.getInstanceOf("module");
st.add("type", "MAIN");
st.add("name", "test");
st.add("content", "abc");
System.out.println(st.render());
// prints "MAIN MODULE test; abc; END MODULE."
How do I get ANTLR and ST to read in a text file and produce pretty-printed output?
MAIN MODULE test;
abc;
END MODULE.
Should become
MAIN MODULE test; abc; END MODULE.
For example. (That's not how I plan to format all the output, don't worry. It'll pretty print much prettier than that.)
In this answer I learned that ANTLR 4 generates walkers automatically. Assuming my ANTLR grammar is correct/well-written, how do I match up the ANTLR rules/tokens to my template arguments to generate output from an input text file?
If I missed it in the documentation somewhere let me know. There are much fewer examples of ANTLR 4 and ST 4 than the previous versions.

Given a parser rule
r : a b c ;
the generated parse-tree will contain a node rContext with child nodes aContext, bContext, cContex, each potentially having further child nodes, for each instance in the input stream where the rule is matched.
The walk will produce the series of listener (or visitor) calls
enterR
enterA
....
exitA
enterB
....
exitB
enterC
....
exitC
exitR
Each call contains a reference to the instance context within the parse-tree, giving access to the actual values that could be passed to ST in prefix/suffix order relative to intervening child nodes.
Where simple prefix/suffix access ordering alone is not sufficient (or undesirably complex), use one or more prior parse-tree walks to analyze the more complex nodes and annotate the node instances with the analysis products. In the final output walk, reference the analysis products for the values to pass to ST.
Depending on actual circumstances, it would not be unusual for the analysis of a node to collect values from its children, pass the lot to a template for detail expansion, formatting, etc, and store the result as a node annotation string pending output in the final output walk.
Update
To annotate parse-tree nodes, you can use ParseTreeProperty.
Where the annotation set becomes more than 'trivial', a typical option is to associate a node-type specific 'decorator' class instance with a parse-tree node/context instance largely as a better data container. Of course, the node-type specific methods can then be embedded into their corresponding decorator classes to keep concerns nicely separated.
The listener methods become something like this:
public void exitNodeB(NodeBContext ctx) {
super.exitNodeB(ctx);
NodeBDescriptor descriptor = (NodeBDescriptor) getDescriptor(ctx);
if (analysisPhase) {
descriptor.process(); // node-type specific analysis
} else {
descriptor.output(); // node-type specific output generation
}
}
The specifics of when to analyze (on enter, exit, or both) and when to output will be dependent on the particular application. Implement to suit your purposes.

Related

ANTLR4 dynamic token type

The language that I am lexing requires the ability to hot-swap keywords depending on runtime configuration.
It's relatively simple how to do this so long as you are OK embedding target-specific code in your grammar (Java):1
lexer grammar LanguageLexer;
tokens {
If, Else, While // etc
}
#header {
import java.util.Map;
}
#members {
private Map<String, Integer> keywords;
public NafiLexer(CharStream input, Map<String, Integer> keywords) {
this(input);
this.keywords = keywords;
}
}
WS: [ \n\t\r]+ -> skip;
ID: [a-zA-Z]+ { if(keywords.containsKey(getText())) setType(keywords.get(getText())); };
However, I would like to remove all target-specific code from my .g4 file, as my .g4s will be used across multiple target languages for separate projects.
In a Parser, you can use a Listener to remove embedded actions and decouple the grammar from application-specific code. However, if there exists a way to do this at the Lexer level2, I have yet to find it (thus asking this question).
The way to accomplish this seems to be to wrap the TokenStream pulled from the Lexer. This wrapping TokenStream would read Tokens as they were provided, and apply the transformation currently in an embedded action to any ID tokens present.
This (in theory) would not be difficult to implement; however, this feels like functionality that should be possible with just the already defined ANTLR symbols. So, the question is: is it possible to conditionally change the type of tokens passing through a TokenStream within the existing ANTLR system? If not, what is the lowest-friction way of accomplishing that task? An example using the Java library would be preferred, as that is the one I am most familiar with.
And as a sub-question: if I end up creating a TokenTransformationStream for my required targets, would it be worth suggesting adding it to the existing libraries? (I can create symbols for all current supplied targets.)
1 Yes, this will crash if you construct a Lexer with the regular constructor. In a real application, it might be worth fixing that, but for this example, it doesn't matter.
2 I feel this is an appropriate task for the lexer level for a couple reasons. The main reason is that it seems common practice to pass keywords as keyword tokens always, and then, if necessary, allow them as identifiers at the parser level (such as context-sensitive keywords). Also, other questions asking simply how to achieve this effect suggest a method basically equivalent to the above provided embedded actions solution.

This may not turn out to be the answer to the question, but it's simply too long for comment.
I meant lexer modes in the comments because I was focused on this part hot-swap keywords. I don't know why you need to change the token type, but if you use lexer modes maybe you will not care about it.
The only catch is there need to be some keywords which indicate changing of the lexer mode. Basically one lexer mode would be a sub-lexer grammar (of sorts.)
RUNTIME_CFG_! : 'runtime_cfg_1' -> mode(m_CGF_1);
...
mode m_CGF_1;
KEYWORD1 : 'key1;
...
If there are some same keywords you can also use lexer function type* to explicitly set the type of the token.
*I can't remember in the moment how it's called but by lexer function I mean one of those like mode, skip etc..

Recording line/character positions from generated FreeMarker templates

Is there a way to record specific line/character positions from generated FreeMarker templates? The purpose would be to highlight specific sections of the generated output file without having to parse the generated output file.
For example, let's say I have this template:
function foo()
{
ordinary_crap();
ordinary_crap();
do_something_special();<#mark foospecial>
ordinary_crap();
}
function bar()
{
ordinary_crap();
do_something_really_special();<#mark barspecial>
ordinary_crap();
ordinary_crap();
}
function baz()
{
foo();<#mark foo_call_1>
ordinary_crap();
bar();<#mark bar_call_1>
}
I want the <#mark> directive not to yield any generated output, but to associate mark names foospecial, barspecial, foo_call_1 and bar_call_1 with the line and position-within-a-line of where the <#mark> directives are located in the generated output. The example above I showed independent single points but it would be also useful to have begin/end pairs to mark specific ranges.
The alternatives I can see are
parsing the output independently -- not always possible, for example what if there are several identical instances of something in the output, and I want to highlight a specific one of those?
adding "mark hints" and removing them via my own postprocessing step. For example
<mark name="years">Fourscore and seven</mark> years ago
something really brilliant happened to a really nice guy named
<mark name="niceguyname">Fred</mark>.
Then I could postprocess this and remove the <mark > tags (assuming they don't conflict with the rest of the content), recording positions as I go.
But both of these seem kind of hacky.

From your TemplateDirectiveModel implementation (I assume that's how you implement mark, not with #macro), call env.getCurrentDirectiveCallPlace(). The returned DirectiveCallPlace has getBeginColumn() and getBeginLine() methods.

Problems rendering properties of model objects passed to template

I'm a StringTemplate newbie with a really basic. I'm trying to utilize this to send automated emails. I've read as much as I can to digest what is out there. I'm starting with a simple test case and having trouble getting properties of objects to render. As a test case I have the following in my template file email.stg.
delimiters "$", "$"
activate(person) ::= <<$person.personFirstName$>>
I'm trying to pass my Person object and have the template render the personFirstName property. This would call a getter Person.personFirstName() which is public.
My java code looks like:
Person _thePerson = //fetched from database
STGroup group = new STGroupFile(/tmp/email.stg);
ST st = group.getInstanceOf("activate");
st.add("person", _thePerson);
System.out.println("the person first name is: " + _thePerson.personFirstName());
System.out.println(st.render());
My output reflects that the personFirstName property is available via java but my template does not render it.
the person first name is: Ivan
<nothing is returned here>
If I limit the activate template to this:
activate(person) ::= <<$person$>>
I get the following result where the person object is rendered as _thePerson.toString().
the person first name is: Ivan
999999999 - Johnson, Ivan G
Any help would be greatly appreciated so I can move on to the more complex template that I'm trying to get to.

Answering my own question: I think this is the answer from the Introduction here - https://theantlrguy.atlassian.net/wiki/display/ST4/Introduction.
"...in general they follow the old JavaBeans naming convention. StringTemplate looks for methods getP(), isP(), hasP() first. If it fails to find one of those methods, it looks for a field called p."
I took that to mean that "p" would work as a method name as well but was wrong. I'm using Enterprise Object Framework and, unfortunately, my model .java files' attribute accessors do not use the "get*" convention which means ST never requests them. They are also not stored as fields. I'll have to think of a way around it but I don't think I'm inclined to change large scale model frameworks to accommodate this. If I add cover (get*) methods it works but that is not the best solution.
Generally, I've never encountered this issue since WebObjects template engine will render with or without "get*."

What are some common strategies for refactoring large "state-only" objects?
I am working on a specific soft-real-time decision support system which does online modeling/simulation of the national airspace. This piece of software consumes a number of live data feeds, and produces a once-per-minute estimate of the "state" of a large number of entities in the airspace. The problem breaks down neatly until we hit what is currently the lowest-level entity.
Our mathematical model estimates/predicts upwards of 50 parameters for a timeline of several hours into the past and future for each of these entities, roughly once per minute. Currently, these records are encoded as a single Java class with a lot of fields (some get collapsed into an ArrayList). Our model is evolving, and the dependencies among the fields are not yet set in stone, so each instance wanders through a convoluted model, accumulating settings as it goes along.
Currently we have something like the following, which uses a builder pattern approach to build up the contents of the record, and enforce what the known dependencies are (as a check against programmer error as evolve the mode.) Once the estimate is done, we convert the below into an immutable form using a .build() type method.
final class OneMinuteEstimate {
enum EstimateState { INFANT, HEADER, INDEPENDENT, ... };
EstimateState state = EstimateState.INFANT;
// "header" stuff
DateTime estimatedAtTime = null;
DateTime stamp = null;
EntityId id = null;
// independent fields
int status1 = -1;
...
// dependent/complex fields...
... goes on for 40+ more fields...
void setHeaderFields(...)
{
if (!EstimateState.INFANT.equals(state)) {
throw new IllegalStateException("Must be in INFANT state to set header");
}
...
}
}
Once a very large number of these estimates are complete, they are assembled into timelines where aggregate patterns/trends are analyzed. We have looked at using an embedded database but have struggled with performance issues; we'd rather get this sorted out in terms of data modeling and then incrementally move portions of the soft-real-time code into an embedded data store.
Once the "time sensitive" pieces of this are done, the products are flushed to flat files and a database.
Problems:
It's a giant class, with way too many fields.
There is very little behavior encoded in the class; it's mostly a holder for data fields.
Maintaining the build() method is extremely cumbersome.
It feels clumsy to manually maintain a "state machine" abstraction merely for the purpose of ensuring that a large number of dependent modeling components are properly populating a data object, but it has saved us a lot of frustration as the model evolves.
There is a lot of duplication, particularly when the records described above are aggregated into very similar "rollups" which amount to rolling sums/averages or other statistical products of the above structure in time series.
While some of the fields could be clumped together, they are all logically "peers" of one another, and any breakdown we've tried has resulted in having behavior/logic artificially split and needing to reach two levels deep in indirection.
Out of the box ideas entertained, but this is something we need to evolve incrementally. Before anyone else says it, I'll note that one could suggest that our mathematical model is insufficiently crisp if the data representation for that model is this hard to get ahold of. Fair point, and we're working that, but I think that's a side-effect of an R&D environment with a lot of contributors, and a lot of concurrent hypotheses in play.
(Not that it matters, but this is implemented in Java. We use HSQLDB or Postgres for output products. We don't use any persistence framework, partly out of a lack of familiarity, partly because we have enough performance trouble with just the database alone and hand-coded storage routines... we're skeptical of moving towards additional abstraction.)

I had much of the same problem you did.
At least I think I did, sounds like I did. Representation was different, but at 10,000 feet, sounds pretty much the same. Crapload of discrete, "arbitrary" variables and a bunch of ad hoc relationships among them (essentially business driven), subject to change at a moment's notice.
You also have another issue, which you sorta mentioned, and that was the performance requirement. Sounds like faster is better, and likely a slow perfect solution would be tossed out for the fast lousy one, simply because the slower one can't meet a baseline performance requirement, no matter how good it is.
To put it simply, what I did was I designed a simple domain specific rule language for my system.
The entire point of the DSL was to implicitly express relationships and package them up in to modules.
Very crude, contrived example:
D = 7
C = A + B
B = A / 5
A = 10
RULE 1: IF (C < 10) ALERT "C is less than 10"
RULE 2: IF (C > 5) ALERT "C is greater than 5"
RULE 3: IF (D > 10) ALERT "D is greater than 10"
MODULE 1: RULE 1
MODULE 2: RULE 3
MODULE 3: RULE 1, RULE 2
First, this is not representative of my syntax.
But you can see from the Modules, that it is 3, simple rules.
The key though, is that it's obvious from this that Rule 1 depends on C, which depends on A and B, and B depends on A. Those relationships are implied.
So, for that module, all of those dependencies "come with it". You can see if I generated code for Module 1 it might look something like:
public void module_1() {
int a = 10;
int b = a / 5;
int c = a + b;
if (c < 10) {
alert("C is less than 10");
}
}
Whereas if I created Module 2, all I would get is:
public void module_2() {
int d = 7;
if (d > 10) {
alert("D is greater than 10.");
}
}
In Module 3 you see the "free" reuse:
public void module_3() {
int a = 10;
int b = a / 5;
int c = a + b;
if (c < 10) {
alert("C is less than 10");
}
if (c > 5) {
alert("C is greater than 5");
}
}
So, even though I have one "soup" of rules, the Modules root the base of the dependencies, and thus filter out the stuff it doesn't care about. Grab a module, shake the tree and keep what's left hanging.
My system used the DSL to generate source code, but you can easily have it create a mini runtime interpreter as well.
Simple topological sorting handled the dependency graph for me.
So, the nice thing about this is that while there was inevitable duplication in the final, generated logic, at least across modules, there wasn't any duplication in the rule base. What you as a developer/knowledge worker maintain is the rule base.
What is also nice is that you can change an equation, and not worry so much about the side effects. For example, if I change do C = A / 2, then, suddenly, B drops out completely. But the rule for IF (C < 10) doesn't change at all.
With a few simple tools, you can show the entire dependency graph, you can find orphaned variables (like B), etc.
By generating source code, it's going to run as fast as you want.
In my case, it was interesting to see a rule drop a single variable and see 500 lines of source code vanish from the resulting module. That's 500 lines I didn't have to crawl through by hand and remove during maintenance and development. All I had to do was change a single rule in my rule base and let "magic" happen.
I was even able to do some simple peephole optimization and eliminate variables.
It's not that hard to do. Your rule language can be XML, or a simple expression parser. No reason to go full boat Yacc or ANTLR on it if you don't want to. I'll put a plug in for S-Expressions, no grammar needed, brain dead parsing.
Spreadsheets also make a great input tool, actually. Just be strict on the formatting. Kind of sucks for merging in SVN (so, Don't Do That), but end users love it.
You may well be able to get away with an actual rule based system. My system wasn't dynamic at runtime, and didn't really need sophisticated goal seeking and inference, so I didn't need the overhead of such a system. But if one works for you out of the box, then happy day.
Oh, and for an implementation note, for those who don't believe you can hit the 64K code limit in a Java method, well I can assure you it can be done :).

Splitting a Large Data Object is very similar to Normalizing a Large Relational Table (first and second normal form). Follow the rules to reach at least second normal form and you may have a good decomposition of the original class.

From experience working also with R&D stuff with soft real-time performance constrains (and sometimes monster fat classes), I would suggest NOT to use OR mappers. In such situations, you'll be better off dealing "touching the metal" and working directly with JDBC result sets. This is my suggestion for apps with soft real-time constrains and massive amounts of data items per package. More importantly, if the number of distinct classes (not class instances, but class definitions) that need to persisted is large, and you also have memory constrains in your specs, you will also want to avoid ORMs like Hibernate.
Going back to your original question:
What you seem to have is a typical problem of 1) mapping multiple data items into a OO model and 2) such multiple data items do not exhibit a good way of grouping or segregation (and any attempt to grouping tends simply not to feel right.) Sometimes the domain model does not lend itself for such aggregation, and coming up with an artificial way of doing so typically ends up in compromises that don't satisfy all design requirements and desires.
To make matters worse, a OO model typically requires/expects you to have all the items present in a class as class' fields. Such a class is typically without behavior, so it is just a struct-like construct, aka data envelope or data shuttle.
But such situations beg the following questions:
Does your application need to read/write all 40, 50+ data items at once, always?
*Must all data items be always present?*
I do not know the specifics of your problem domain, but in general I've found that we rarely ever need to deal with all data items at once. This is where a relational model shines because you don't have to query all rows from a table at once. You only pulls those you need as projections of the table/view in question.
In a situation where we have a potentially large number of data items, but on average the number of data items being passed down the wire is less than the maximum, you'd be better off using a Properties pattern.
Instead of defining a monster envelope class holding all items :
// java pseudocode
class envelope
{
field1, field2, field3... field_n;
...
setFields(m1,m2,m3,...m_n){field1=m1; .... };
...
}
Define a dictionary (based on a map for example):
// java pseudocode
public enum EnvelopeField {field1, field2, field3,... field_n);
interface Envelope //package visible
{
// typical map-based read fields.
Object get(EnvelopeField field);
boolean isEmpty();
// new methods similar to existing ones in java.lang.Map, but
// more semantically aligned with envelopes and fields.
Iterator<EnvelopeField> fields();
boolean hasField(EnvelopeField field);
}
// a "marker" interface
// code that only needs to read envelopes must operate on
// these interfaces.
public interface ReadOnlyEnvelope extends Envelope {}
// the read-write version of envelope, notice that
// it inherits from Envelope, but not from ReadOnlyEnvelope.
// this is done to make it difficult (but not impossible
// unfortunately) to "cast-up" a read only envelope into a
// mutable one.
public interface MutableEnvelope extends Envelope
{
Object put(EnvelopeField field);
// to "cast-down" or "narrow" into a read only version type that
// cannot directly be "cast-up" back into a mutable.
ReadOnlyEnvelope readOnly();
}
// the standard interface for map-based envelopes.
public interface MapBasedEnvelope extends
Map<EnvelopeField,java.lang.Object>
MutableEnvelope
{
}
// package visible, not public
class EnvelopeImpl extends HashMap<EnvelopeField,java.lang.Object>
implements MapBasedEnvelope, ReadOnlyEnvelope
{
// get, put, isEmpty are automatically inherited from HashMap
...
public Iterator<EnvelopeField> fields(){ return this.keySet().iterator(); }
public boolean hasField(EnvelopeField field){ return this.containsKey(field); }
// the typecast is redundant, but it makes the intention obvious in code.
public ReadOnlyEnvelope readOnly(){ return (ReadOnlyEnvelope)this; }
}
public class final EnvelopeFactory
{
static public MapBasedEnvelope new(){ return new EnvelopeImpl(); }
}
No need to set up read-only internal flags. All you need to do is downcast your envelope instances as Envelope instances (that only provide getters).
Code that expects to read should operate on read-only envelopes and code that expects to change fields should operate on mutable envelopes. Creation of the actual instances would be compartmentalized in factories.
That is, you use the compiler to enforce things to be read-only (or allow things to be mutable) by establishing some code conventions, rules governing what interfaces to use where and how.
You can layer your code into sections that need to write separate from code that only needs to read. Once that's done, simple code reviews (or even grep) can identify code that is using the wrong interface.)
Problems:
Non-public Parent Interface:
Envelope is not declared as a public interface to prevent erroneous/malicious code from casting a read-only envelope down to a base envelope and then back to a mutable envelope. The intended flow is from mutable to read-only only - it is not intended to be bi-directional.
The problem here is that extension of Envelope is restricted to the package that contains it. Whether that is a problem will depend on the particular domain and intended usage.
Factories:
The problem is that factories can (and most likely will) be very complex. Again, the nature of the beast.
Validation:
Another problem introduced with this approach is that now you have to worry about code that expects field X to be present. Having the original monster envelope class partially frees you from that worry because, at least syntactically, all fields are there...
... whether the fields are set or not, that was another matter that still remains with this new model I'm proposing.
So if you have client code that expects to see field X, the client code has to throw some type of exception if the field is not present (or to computer or read a sensible default somehow.) In such cases, you will have to
Identify patterns of field presence. Clients that expect field X to be present might be grouped separately (layered apart) from clients that expect some other field to be present.
Associate custom validators (proxies to read-only envelope interfaces) that either throw exceptions or compute default values for missing fields according to some rules (rules provided programmatically, with an interpreter, or with a rules engine.)
Lack of Typing:
This might be debatable, but people used to work with static typing might feel uneasy with losing the benefits of static typing by going to a loosely typied map-based approach. The counter-argument of this is that most of the web works on a loose typing approach, even on the Java side (JSTL, EL.)
Problems aside, the larger the maximum number of possible fields and the lower the average number of fields present at any given time, the most effective wrt performance this approach will be. It adds additional code complexity, but that's the nature of the beast.
That complexity doesn't go away, and either will be present in your class model or in your validation code. Serialization and transferring down the wire is much more efficient, though, specially if you expect massive numbers of individual data transfers.
Hope it helps.

Actually this looks like a frequent problem that game developers face, bloated classes holding numerous variables and methods because of a deep inheritance tree etc.
There's this blog post about how and why to select composition over inheritance, maybe it would help.

One way you may be able to intelligently break up a large data class is to look at patterns of access by client classes. For example, if a set of classes only accesses fields 1-20 and another set of classes only accesses fields 25-30, maybe those groups of fields belong in separate classes.

Java source refactoring of 7000 references

I need to change the signature of a method used all over the codebase.
Specifically, the method void log(String) will take two additional arguments (Class c, String methodName), which need to be provided by the caller, depending on the method where it is called. I can't simply pass null or similar.
To give an idea of the scope, Eclipse found 7000 references to that method, so if I change it the whole project will go down. It will take weeks for me to fix it manually.
As far as I can tell Eclipse's refactoring plugin of Eclipse is not up to the task, but I really want to automate it.
So, how can I get the job done?

Great, I can copy a previous answer of mine and I just need to edit a tiny little bit:
I think what you need to do is use a source code parser like javaparser to do this.
For every java source file, parse it to a CompilationUnit, create a Visitor, probably using ModifierVisitor as base class, and override (at least) visit(MethodCallExpr, arg). Then write the changed CompilationUnit to a new File and do a diff afterwards.
I would advise against changing the original source file, but creating a shadow file tree may me a good idea (e.g. old file: src/main/java/com/mycompany/MyClass.java, new file src/main/refactored/com/mycompany/MyClass.java, that way you can diff the entire directories).

Eclipse is able to do that using Refactor -> Change Method signature and provide default values for the new parameters.
For the class parameter the defaultValue should be this.getClass() but you are right in your comment I don't know how to do for the method name parameter.

IntelliJ IDEA shouldn't have any trouble with this.
I'm not a Java expert, but something like this could work. It's not a perfect solution (it may even be a very bad solution), but it could get you started:
Change the method signature with IntelliJ's refactoring tools, and specify default values for the 2 new parameters:
c: self.getClass()
methodName: Thread.currentThread().getStackTrace()[1].getMethodName()
or better yet, simply specify null as the default values.

I think that there are several steps to dealing with this, as it is not just a technical issue but a 'situation':
Decline to do it in short order due to the risk.
Point out the issues caused by not using standard frameworks but reinventing the wheel (as Paul says).
Insist on using Log4j or equivalent if making the change.
Use Eclipse refactoring in sensible chunks to make the changes and deal with the varying defaults.
I have used Eclipse refactoring on quite large changes for fixing old smelly code - nowadays it is fairly robust.

Maybe I'm being naive, but why can't you just overload the method name?
void thing(paramA) {
thing(paramA, THE_DEFAULT_B, THE_DEFAULT_C)
}
void thing(paramA, paramB, paramC) {
// new method
}

Do you really need to change the calling code and the method signature? What I'm getting at is it looks like the added parameters are meant to give you the calling class and method to add to your log data. If the only requirement is just adding the calling class/method to the log data then Thread.currentThread().getStackTrace() should work. Once you have the StackTraceElement[] you can get the class name and method name for the caller.

If the lines you need replaced fall into a small number of categories, then what you need is Perl:
find -name '*.java' | xargs perl -pi -e 's/log\(([^,)]*?)\)/log(\1, "foo", "bar")/g'
I'm guessing that it wouldn't be too hard to hack together a script which would put the classname (derived from the filename) in as the second argument. Getting the method name in as the third argument is left as an exercise to the reader.

Try refactor using intellij. It has a feature called SSR (Structural Search and Replace). You can refer classes, method names, etc for a context. (seanizer's answer is more promising, I upvoted it)

I agree with Seanizer's answer that you want a tool that can parse Java. That's necessary but not sufficient; what you really want is a tool that can carry out a reliable mass-change.
To do this, you want a tool that can parse Java, can pattern match against the parsed code, install the replacement call, and spit out the answer without destroying the rest of the source code.
Our DMS Software Reengineering Toolkit can do all of this for a variety of languages, including Java. It parses complete java systems of source, builds abstract syntax trees (for the entire set of code).
DMS can apply pattern-directed, source-to-source transformations to achieve the desired change.
To achieve the OP's effect, he would apply the following program transformation:
rule replace_legacy_log(s:STRING): expression -> expression
" log(\s) " -> " log( \s, \class\(\), \method\(\) ) "
What this rule says is, find a call to log which has a single string argument, and replace it with a call to log with two more arguments determined by auxiliary functions class and method.
These functions determine the containing method name and containing class name for the AST node root where the rule finds a match.
The rule is written in "source form", but actually matches against the AST and replaces found ASTs with the modified AST.
To get back the modified source, you ask DMS to simply prettyprint (to make a nice layout) or fidelity print (if you want the layout of the old code preserved). DMS preserves comments, number radixes, etc.\
If the exisitng application has more than one defintion of the "log" function, you'll need to add a qualifier:
... if IsDesiredLog().
where IsDesiredLog uses DMS's symbol table and inheritance information to determine if the specific log refers to the definition of interest.

Il fact your problem is not to use a click'n'play engine that will allow you to replace all occurences of
log("some weird message");
by
log(this.getClass(), new Exception().getStackTrace()[1].getMethodName());
As it has few chances to work on various cases (like static methods, as an example).
I would tend to suggest you to take a look at spoon. This tool allows source code parsing and transformation, allowing you to achieve your operation in a -obviously code based- slow, but controlled operation.
However, you could alos consider transforming your actual method with one exploring stack trace to get information or, even better, internally use log4j and a log formatter that displays the correct information.

I would search and replace log( with log(#class, #methodname,
Then write a little script in any language (even java) to find the class name and the method names and to replace the #class and #method tokens...
Good luck

If the class and method name are required for "where did this log come from?" type data, then another option is to print out a stack trace in your log method. E.g.
public void log(String text)
{
StringWriter sw = new StringWriter();
PrintWriter pw = new PrintWriter(sw, true);
new Throwable.printStackTrace(pw);
pw.flush();
sw.flush();
String stackTraceAsLog = sw.toString();
//do something with text and stackTraceAsLog
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.