All C++ functions are of the form
type name ( parameters ) { … }
To identify the regex, I'm using
regex = "...";
pattern = Pattern.compile(regex);
matcher = pattern.matcher(line);
if (matcher.matches())
{
...
}
I can only realistically search for the type name ( part since I am using a line reader and function definitions can be multi-line and I'm not sure of what the regex would be. .*\\b.*\\( was my latest guess, but it doesn't work. Any help would be greatly appreciated.
Unfortunately, there is no general regular expression that can match all function definitions.
The C++ grammar specification allows you to parenthesize the name of any variable as many times as you'd like. For example, you can write
int ((((((a))))));
to declare a variable named a. This means that you can define functions like this:
void whyWouldYouDoThis(int (((((becauseICan)))))) {
/* ... */
}
The problem with this is that it means that function declarations can have arbitrarily-complicated nesting of parentheses. You can prove that, in general, sets of strings that require keeping track of balanced parentheses cannot be matched by regular expressions (formally, that the language of those strings is not regular), and unfortunately this applies here.
This is definitely really contrived, but there are cases where you will see lots of nested parentheses. For example, consider this function:
void thisFunctionTakesACallback(void imACallbackFunction()) {
/* ... */
}
Here, there's an extra layer of parentheses induced by the fact that the function argument is itself of function type. If that function took a callback, you could see something like this:
void thisFunctionTakesACallback(void soDoesThisOne(void imACallbackInACallback())) {
/* ... */
}
If you're looking to find all function declarations, you might be better off using a parser and defining a grammar for what you're looking for, since these patterns are context-free. You could alternatively consider hooking into a compiler front-end (g++ can produce ASTs for you in the GIMPLE or GENERIC framework, for example) and using that to extract what you're looking for. That guarantees you won't miss anything.
Related
Given a string, how can I validate the contents are a valid PCRE within Java? I don't want to use the regex in any way within Java, just validate its contents.
java.util.regex.Pattern is almost good enough, but its javadoc points out how it differs from Perl.
In detail, there's a system with 3 relevant components:
Component A - Generates, among other things, Perl-compliant regular expressions (PCREs) to be evaluated at runtime by some other component capable of executing PCREs (component C). What's "generated" here may be coming from a human.
Component B - Validates that data generated by component A and, if valid, shuttles it over to the runtime (component C).
Component C - Some runtime that evaluates PCREs. This could be a Perl VM, a native process using the PCRE library, Boost.Regex, etc., or something else that can compile/execute a Perl-compliant regular expression.
Now, component B is implemented in Java. As mentioned above, it needs to validate a string potentially containing a PCRE, but does not need to execute it.
How could we do that?
One option would be something like:
public static boolean isValidPCRE(String str) {
try {
Pattern.compile(str);
} catch (PatternSyntaxException e) {
return false;
}
return true;
}
The problem is that java.util.regex.Pattern is designed to work with a regular expression syntax that is not exactly Perl-compliant. The javadoc makes that quite clear.
So, given a string, how can I validate the contents are a valid PCRE within Java?
Note: There are some differences between libPCRE and Perl, but they are pretty minor. To a certain degree, that is true of Java's syntax as well. However, the question still stands.
I have property file(key/value) pair from where I currently read a value against a key and display that value as it is in the UI .
The complexity have increased,Now the value is more dynamic based on some formula. The formula includes a variable parameter whose value I will get at run time.
Is there any java design pattern to design this scenario .
I was thinking to put a method name in the property file against a key.
Now I will read the key and fetch the method name . This method will calculate the value for that particular key.
Please let me know your suggestion
Is there any java design pattern to design this scenario .
I don't know if there is a pattern.
If I understand your question right I can explain what I do usually.
Insert localizable strings in my properties values
I usually use #number#
Replace it later when variables are resolved
Little example:
messages.properties
name.of.key = sum of #0# + #1# = #2#
Then I read the value from and replace the #num# with appropiated values (NOTE: here is in the same method for shortenes, but I use an external replace method):
public void printSum(int n1, int n2) {
String myString = messageSource("name.of.key", Locale.getDefault(), null, null));
myString.replace("#0#", String.valueOf(n1));
myString.replace("#1#", String.valueOf(n2));
myString.replace("#2#", String.valueOf(n1+n2));
System.out.println(myString);
}
OUTPUT printSum(1,2);
sum of 1 + 2 = 3
Looks like the ANTLR would make here a great fit.
It is a parser generator. You give it grammar as an input and in return it provides you with a parser.
You can use the parser to transform the textual formula into a parsed tree representation. After that, you can run a visitor to evaluate each of the nodes. You just write some simple function to implement the behavior, such as:
public Double visitAdd(AntlrNode left, AntlrNode right) {
Double left = visit(left);
Double right = viist(right);
return left + right;
}
The grammar is very close to the familiar BNF notation. You just describe how your formula strings are. For example:
formula : left '+' right;
left: Number;
right: Number;
Number: [0-9]+;
Use Java built-in JavaScript engine to evaluate expressions. To match the spirit more closely, you can use JSON for properties.
If security is important, you need to provide the class filter. It can be very simple and restrictive as you only need to evaluate trivial expressions. The example on class filter can be found here.
You can use the strategy pattern putting the method/algorithm name in the property file:
public interface IFormula{
public int formula(int a, int b);
}
public class Sum implements IFormula{
public int formula(int a, int b){
return a+b;
}
}
Then you can select the method getting the name from a property file:
public static Strategy getStrategy(Name name) {
switch (name) {
case SUM:
return new Sum();
...
}
}
Another solution is to refactor your map so that the value type is a functional interface whose method accepts an arbitrary parameter. For example:
#FunctionalInterface
interface ValueType<R> {
R eval(Object param);
}
This solution (or a variant of it) would enable you to associate a lambda with your keys rather than a fixed value. The performance of a lambda ought to be much better than a run-time parser while still affording you the flexibility to make the associated value depend upon a run-time argument.
This solution should also be less vulnerable to injection attacks than a solution based on run-time parsing.
Since you seem to want a name for the pattern... the pattern is called: Domain Specific Language.
And again if you want to remain in the realms of abstract patterns and design you can peruse Martin Fowlers discussion on the topic at length.
Needless to say their are a metric ton of tools that solve the above pattern (including some of the answers here).
The other pattern which I highly recommend you NOT do is use a general purpose language that has an evaluator (ie Javascript, EL, Groovy, etc). This generally has security issues and performance issues (of course there are exceptions).
So, in most programming language, if you are using a loop or an if, you can do it without curly braces if there is only a single statement in it, example:
if (true)
//Single statement;
for (int i = 0; i < 10; i++)
//Single Statement
while (true)
//Single statement
However, it doesn't work for functions, example:
void myFunction()
//Single Statement
So, my question, why doesn't it work for functions?
C++ needs it to disambiguate some constructs:
void Foo::bar() const int i = 5;
Now does the const belong to bar or i ?
Because language grammar forbids you to do that.
The Java grammar defines a method as following:
MethodDeclaration:
MethodHeader MethodBody
Methodbody as:
MethodBody:
Block
;
Which means either a Block (see below) or a single semicolon
Block:
{ BlockStatementsopt }
And a block as one or more statements within curly brackets.
However an if is defined as:
IfThenStatement:
if ( Expression ) Statement
Where no block is needed after the closing ) and therefore a single line is ok.
Why they chose to define it that way? One can only guess.
Grammar can be found here: http://docs.oracle.com/javase/specs/jls/se7/html/index.html
This is not a rule, in some languages you can (Python? Yes, I know that's really contrived example :)) ), in other you cannot.
You could very well extend your question for example to class and namespaces, for example, why not:
namespace Example
class Foo : public Bar
public: std::string myMethod()
return "Oh noes!";
right? At each level, that's just a single item, so why not skip the braces everywhere?
The answer is at the same time simple and complex.
In simple terms, it's about readability. Remember that you can layout your code as you like, since whitespaces are usually discarded by the compiler:
namespace Example class Foo : public Bar public: std::string myMethod() return "Oh noes!";
Well, that starts looking unreadable. Notice that if you add the braces back
namespace Example { class Foo : public Bar { public: std::string myMethod() {return "Oh noes!";}}}
then it, strangely, becomes somewhat comprehensible.
The actual problem is not readability (who cares anyways? I'm joking of course) but in the latter: comprehension. Not only you must be able to comprehend the code - the compiler must. And for the compiler there is no such thing as "oh, this looks like function". The compiler must be absolutely sure that it is a function. Also, it must be completely sure about where it starts, where it ends, and so on. And it must do that without looking at whitespaces too much, since C-family languages allow you to do add them in any quantities you like.
So, let's look again at the packed-up no-braces example
namespace Example class Foo : public Bar public : std::string myMethod() return "Oh noes!";
^ ^ ^^
I've marked some problematic symbols. Assuming you could define a grammar that handles it, please note how the meaning of ":" character changes. At one time it's denoting that you're specifying inheritance, at other point it's specifying access modifier to a method, at third place it's just namespace qualifier. Ok, the third one could be discarded if you were smart and noticed it's actually '::' symbol, not just a ':' character.
Also, meaning of keywords can change:
namespace Example class Foo : public Bar public : std::string myMethod() return "Oh noes!";
^^^^^^ ^^^^^^
At first place, it defines access modifier for inherited base class, at second place it defined access modifier for a method. What's more, at first place it's not meant to be followed by a ":" and at second place it's required to be followed by it!
So many rules, exceptions and corner cases, and we covered just 2 simple things: public and ':'. Now, imagine you are to specify the grammar for the whole language. You describe everything in the way you'd like to have. But, when you gather all the rules together, they at some point may start overlap and collide with each other. After adding Nth rule, it may happen that your 'compiler' would be unable to tell whether the 'public' actually marks inheritance, or starts a method:
namespace Example class Foo : public ::Bar public : std::string myMethod() return "Oh noes!";
^^^^^^^^ ^^^^^^^^
Note that I only changed the Bar to ::Bar. I only added a namespace qualifier, and now our rule of "public is followed by a colon" is trashed. As I now added a rule that "base class names may have namespace qualifiers", I also must add more rules to cover yet another corner cases - to remove the ambiguity of the meaning of "public" and ":" in this place.
To cut the long talk: the more rules, the more problem you have. The "compiler" grows, gets slower, eats more resources to work. This results in inability to handle large code files, or in frustration when the user must wait oh-so-long for that module to compile.
But what's worse for the user is, the more complex or ambiguous, the worse error messages are. Noone wants to use a compiler that is unable to parse some code and also unable to tell you what's wrong with it.
Remember in C++ what happens when you forget some ';' in a .h file? Or when you forget some }? Compiler reports you an error 30 or 300 lines farther. This is because the ';' and '{}' can be ommitted in many places, and for that 30 or 300 lines, the compiler simply does not yet know that's something wrong! Were the braces required everywhere, the point of error could be pinpointed faster.
The other way: making them optional at namespace, class, or function level, would remove the basic block-starts/block-ends markers and, at least:
could make the grammar ambiguous (and hence force to add more rules)
could hurt detecting (and reporting!) errors
any part of which noone really wants.
The C++ grammar is so complex, that it actually might be not possible to omit the braces at those places at all. For Java or plain C, I think it could be possible to make a grammar/compiler that would not require them, but would it would still hurt error reporting much. Especially in C which allows to use #include and macros. In early Java, the impact might be lesser, as the grammar is relatively simple, compared i.e. to current C++..
Probably the simplest, fastest, easiest to implement, and probably easiest to learn grammar would .. require braces (or any other delimiters) just about everywhere. Check LISP for example. But then, large part of your work would consist of constantly writing the same required markers, which many language-users simply does not like (i.e. I get nauseous when I need to work on some old code in VisualBasic with its "if then end if" yuck)
Now, if you look at brace-less language like Python - how does they solve it? They denote the block-starts/block-ends by .. intendation. In this language you must indent your code properly. If you don't indent it correctly, it will not compile at all, or it the loops/functions/etc will silently get their code messed up, because the compiler will not know what part does belong to which scope. No free lunch here again.
Basically a method(function) is a collection of statements that are grouped together to perform an operation. We group the statements for reusable. That is if you know that a set of instructions will used often in that case we create it as a separate function.
If you can perform the task in a single line of code, then why do you need to write a function?
Because the grammar of the language doesn't allow you to.
Here is the grammar for a function in C taken from the ISO/IEC 9899-1999 specification:
6.9.1 Function definitions
Syntax
1 function-definition:
declaration-specifiers declarator declaration-listopt compound-statement
The compound-statement part is the body of a function, and a compound statement is declared as
compound-statement:
{ block-item-listopt }
i.e. it starts and ends with braces.
An if, while or similar body can have a statement as its body.
(6.8.5) iteration-statement:
while ( expression ) statement
A statement can be one of several constructs.
statement:
labeled-statement
compound-statement
expression-statement
selection-statement
iteration-statement
jump-statement
of which only compound-statement requires the braces.
In c++ you need a compound statement to make a function body - which is actually surrounded with curly barces. It does not mean you need to have curly braces right immediately, following will compile just fine:
int foobar()
try {
return 1;
}
catch (...){return 0;}
You can't precisely say there are no one statement functions in C#. Anonymous methods could be one of them. Without single line statements we could not have Lambda expression in c#. The C# 3.0 wouldn't be exist.
There is no reason to add that extra parsing code in the compiler because the functionality is really useless, how many one line methods have you written that are not accessors or mutators? This has been dealt with in C# via properties but not yet in Java.
So the reason is, it's unlikely to be used considering most developers discourage leaving out optional bracket blocks anyway.
I am wondering what would be a regular expression that would detect any function declaration which has a body containing a call to itself in Java.
Example of a method that would match:
public int method()
{*n
method();
}*n
Thank you for any help.
Consider the following code samples:
public int method() {
System.out.prontln("method() started");
}
or
public int method() {
// this method() is just an example
}
Do you see now that you need a full-blown parser?
I don't see how this could be done reliably with a regular expression, since the arguments to any method call could be arbitrarily complex, and even include anyonymous classes containing similarly named methods.
So, the answer is "no"; at least not if you want it to be reliable.
This is a quick and dirty example. It would lead to many false positives and generally be slow. It doesn't take into account curly brackets and strings which could contain curlies. However, it works for your input. :-)
Matcher m = Pattern.compile("([\\w\\d_$]+\\s*\\([^)]*\\))[^{]*\\{.*\\1[^}]+}", Pattern.DOTALL |Pattern.MULTILINE).matcher(s1);
System.out.println(m.matches());
I often find myself searching for statements of a particular form in Java. Say I've written a simple function to express an idiom, such as "take this value, or a default value if it's null"
/** return a if not null, the default otherwise. */
public static <T> T notNull(T a, T def) {
if (a == null)
return def;
else
return a;
}
Now if I've written this, I want to look for cases in my code where it can be used to simplify, for instance
(some.longExpressionWhichMayBeNull() ? "default string" : some.longExpressionWhichMayBeNull())
The problem is that it's pretty tricky to write a regular expression that matches java syntax. It can be done, of course, but it's easy to get wrong. It's hard to get regular expressions to ignore whitespace in all the right locations always accurately figure out where strings start and stop, know the difference between a cast and a function call etc.
It also seems a bit wasteful, since we already have a java parser, which does that already.
So my question is: is there some Java syntax aware alternative to regular expressions for searching for particular (sub-)expressions?
You'd probably need to build an abstract syntax tree of the Java source file(s) and then analyse that. Might be possibly to leverage PMD (http://pmd.sourceforge.net/) and write a custom rule (http://pmd.sourceforge.net/pmd-5.0.5/howtowritearule.html) to detect and flag expressions that could be optimised as you describe.