I have the following string :
bla {{bla {{bla bla {{afsaasg}} }} blabla}} {{bla bla}} bla
I would like to match
{{bla {{bla bla {{afsaasg}} }} blabla}}
with a regex.
but my regex
{{(.*?)}}
matches
{{bla {{bla bla}}
anyone can help ?
Additional Info : I expect to have not more then 2 brackets at the same time.
Finally I solved this with an own Java fuction. Perhabs this will help someone :
public static ArrayList<String> getRecursivePattern(String sText, String sBegin, String sEnd) {
ArrayList<String> alReturn = new ArrayList<String>();
boolean ok1 = true;
boolean ok2 = true;
int iStartCount = 0;
int iEndCount = 0;
int iStartSearching = 0;
while (ok1) {
int iAnfang = sText.indexOf(sBegin, iStartSearching);
ok2 = true;
if (iAnfang > -1) {
while (ok2) {
int iStartCharacter = sText.indexOf(sBegin, iStartSearching);
int iEndCharacter = sText.indexOf(sEnd, iStartSearching);
if (iEndCharacter == -1) {
// Nothing found . stop
ok2 = false;
ok1 = false;
} else if (iStartCharacter < iEndCharacter && iStartCharacter != -1) {
// found startpattern
iStartCount = iStartCount + 1;
iStartSearching = iStartCharacter + sBegin.length();
} else if (iStartCharacter > iEndCharacter && iEndCharacter != -1 || (iStartCharacter == -1 && iEndCharacter != -1)) {
iEndCount = iEndCount + 1;
iStartSearching = iEndCharacter + sEnd.length();
} else {
if (iStartCharacter < 0) {
// No End found . stop
ok2 = false;
}
}
if (iEndCount == iStartCount) {
// found the pattern
ok2 = false;
// cut
int iEnde = iStartSearching;// +sEnd.length();
String sReturn = sText.substring(iAnfang, iEnde);
alReturn.add(sReturn);
}
}
} else {
ok1 = false;
}
}
return alReturn;
}
I call it:
ArrayList<String> alTest=getRecursivePattern("This {{ is a {{Test}} bla }}","{{","}}");
System.out.println(" sTest : " + alTest.get(0));
.NET has special support for nested item matching, so {{(?>[^\{\}]+|\{(?<DEPTH>)|\}(?<-DEPTH>))*(?(DEPTH)(?!))}} would do what you wanted in C# to any level of nesting, but not Java.
Don't you need to escape the curly braces? I do in notepad++. Anyway, this should do it
\{\{[^{]+\{\{[^{}]+\}\}[^}]+\}\}
You can't do this with regular expressions. It the consequence of the pumping lemma. You need to use context-free grammar's, or perhaps use dedicated tools (like XML/DOM/... parsers).
You can indeed parse this for - say - three levels deep, but you can't let this work for an arbitrary number of levels. Even then, it's better to use context-free grammars (like a LALR compiler compiler), simply because "These are the tools designed to parse such structures.".
In other words, If one day, someone can enter {{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{ bla }}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}, and this is supposed to be valid, it will most likely fail.
One sidenote:
Say the level is for instance i levels deep, you can use a regex like:
for 1: .*?(.*?\{\{.*?\}\}.*?)*.*?
for 2: .*?(.*?\{\{.*?(.*?\{\{.*?\}\}.*?)*.*?\}\}.*?)*.*?
...
But as you can see, the more deep you go, the longer the regex, and there is no way to parse them for arbitrary depth.
See also this discussion for people who want to parse XML/HTML - another recursive language - with regexes.
As you noted, some regular expression toolkits indeed provide tools to count things. These can be found in the P-languages (PHP, Perl,...). These aren't regular expressions (as defined by Kleene, see this Wikipedia-article about what a real regex is) strictly speaking, but simplified parsers. Because they don't describe a regular language. And - currently - not available in most regex libraries including Java. Some of the libraries even provide Turing complete parsers, parsers than can parse anything you can parse algorithmically, but it's not really recommended for advanced tasks...
Related
I want to get the count of conditions a control structure for a given code segment line by line.Can someone help me to get a correct output?
public int[] countCon() {
char opArray[] ={'<','>','=','!'};
String[] lines = code.split("\\r?\\n");
int[] score = new int[lines.length];
int s = 0;
score[s] = 0;
for(String line : lines) {
String tline = line;
if(tline.contains("if") || tline.contains("else if") || tline.contains("while") || tline.contains("do") || tline.contains("for") || tline.contains("switch") || tline.contains("case ")){
if (line.indexOf("if") != -1 ) {
for(int i = 0;i<=opArray.length-1;i++) {
//tline.contains.new String(opArray[i]);
opArray[i]++;
if(tline.contains("<") || tline.contains(">") || tline.contains("<=") || tline.contains(">=") || tline.contains("==") || tline.contains("!")) {
score[s] = score[s]+1;
}
else {
score[s]=0;
}
}
}
}
}
}
This is not a trivial task since the input provided can be in very different styles. The simplest way is to use regular expressions that match on branching keywords such as if (including else if), while, for, etc.. If you are disregarding method calls.
In case you don't want to use regex, your code can be improved too: no need to consider nesting conditions or boolean expressions inside the branching construct. This is because no operators except (tenery exp ? a : b) introduces branching.
Is it possible to have multiple arguments for a .contains? I am searching an array to ensure that each string contains one of several characters. I've hunted all over the web, but found nothing useful.
for(String s : fileContents) {
if(!s.contains(syntax1) && !s.contains(syntax2)) {
found.add(s);
}
}
for (String s : found) {
System.out.println(s); // print array to cmd
JOptionPane.showMessageDialog(null, "Note: Syntax errors found.");
}
How can I do this with multiple arguments? I've also tried a bunch of ||s on their own, but that doesn't seem to work either.
No, it can't have multiple arguments, but the || should work.
!s.contains(syntax1+"") || !s.contains(syntax2+"") means s doesn't contain syntax1 or it doesn't contain syntax2.
This is just a guess but you might want s contains either of the two:
s.contains(syntax1+"") || s.contains(syntax2+"")
or maybe s contains both:
s.contains(syntax1+"") && s.contains(syntax2+"")
or maybe s contains neither of the two:
!s.contains(syntax1+"") && !s.contains(syntax2+"")
If syntax1 and syntax2 are already strings, you don't need the +""'s.
I believe s.contains("") should always return true, so you can remove it.
It seems that what you described can be done with a regular expression.
In regular expression, the operator | marks you need to match one of several choices.
For example, the regex (a|b) means a or b.
The regex ".*(a|b).*" means a string that contains a or b, and other then that - all is OK (it assumes one line string, but that can be dealt with easily as well if needed).
Code example:
String s = "abc";
System.out.println(s.matches(".*(a|d).*"));
s = "abcd";
System.out.println(s.matches(".*(a|d).*"));
s = "fgh";
System.out.println(s.matches(".*(a|d).*"));
Regular Exprsssions is a powerful tool that I recommend learning. Have a look at this tutorial, you might find it helpful.
There is not such thing as multiple contains.
if you require to validate that a list of string is included in some other string you must iterate through them all and check.
public static boolean containsAll(String input, String... items) {
if(input == null) throw new IllegalArgumentException("Input must not be null"); // We validate the input
if(input.length() == 0) {
return items.length == 0; // if empty contains nothing then true, else false
}
boolean result = true;
for(String item : items) {
result = result && input.contains(item);
}
return result;
}
I'm writing a Java program in which I'm checking a list against a string, and then doing stuff to that. In fortran I'd write something along the lines of
where(list(:)==stringToCheck){
...
statements
...
}
Instead I have a headache of a block of for-loops, if staments and breaks all over the place. No perhaps I could neaten the code a little but it still feels far more inefficient than fortran.
Edit, this is the code I've resorted to:
for(int idx=0;idx<player.get_charactersOwned().size();idx++)
{
if(player.get_charactersOwned().get(idx).get_characterName().equals(charName))
{
/* Add character to the game
* Add game to the character*/
System.out.println("Character "+charName+" Found ");
gameToMake.addCharacters(player.get_charactersOwned().get(idx));
player.get_charactersOwned().get(idx).addGame(gameToMake);
break;
}else
{
System.err.println("Character "+ charName +" not found");
System.out.println("Shall I add that Character? y/n ");
choice = scanner.nextLine();
if(choice.equalsIgnoreCase("y"))
{
charName = scanner.nextLine();
Character character = new Character(charName);
characterTempList.add(character);
player.addCharacter(characterTempList);
gameToMake.addCharacters(player.get_charactersOwned().get(idx));
player.get_charactersOwned().get(idx).addGame(gameToMake);
break;
}else{break;}
}
}
As tempting as it is to fix this code, I'd much rather use a work around.
Is there a Java equivilant of this without the use of external libraries?
No, there isn't an equivalent in Java. Instead if you need to check if a list of characters (each with a name) contains a character name then simply do this:
// search the name
boolean found = false;
for (Character c : player.get_charactersOwned()) {
if (c.get_characterName().equals(charName)) {
found = true;
break;
}
}
// perform the check
if (found) {
// do something
} else {
// do something else
}
And by the way, Character is a bad name for your class, it clashes with Java's own Character class. Rename it if possible, to avoid confusion. Alternatively, the loop could have been written like this:
boolean found = false;
for (int i = 0, n = player.get_charactersOwned().size(); i < n && !found; i++) {
Character c = player.get_charactersOwned().get(i);
if (c.get_characterName().equals(charName)) {
found = true;
}
}
What im doing is making a production rule in the form of a string of a finite alphabet, copy it into a char array and then run it through an if statement which calls functions depending on the character.
Such as "lff[f]" which calls functionL, functionF, functionF, functionOpenB, functionF, functionCloseB.
So currently its:
workRuleArr= stringProdrule.toCharArray();
for (char c=0; c < workRuleArr.length; c++){
if (workRuleArr[c] == 'f')
{
functionF();
}
if (workRuleArr[c] == 'l')
{
functionL();
}
etc
This is fine and working, however:
How would i pass parameters to those functions from the production rule such as "l(100)ff.." so that it would call functionL(x)... where x is = 100 and pass 100 to the function??
And there may be many different values for x in the same production rule string. The user inputs the rule in one go at the start o the program so it would need to deal with multiple parameters in the same production rule
Any ideas would be appreciated, if the question is not clear let me know. Thanks
Seems to me that those "rule functions" you have all serve a similar purpose, so they obey the same interface,
like
interface Rule{
void executeRule();
}
Then you have different Rules that implement that interface, like
class RuleF implements Rule{
void executeRule(){
//execute rule F
}
}
class RuleL implements Rule{
void executeRule(){
//execute rule L
}
}
Then you need a simple way to associate a character with a given rule.
Use a HashMap, like:
Map<Character, Rule> ruleMap = new HashMap<Character, Rule>();
ruleMap.add('f', new RuleF());
ruleMap.add('l', new RuleL());
With that you can remove all those "ifs", like
workRuleArr= stringProdrule.toCharArray();
for (char c=0; c < workRuleArr.length; c++){
Rule rule = ruleMap.get(workRuleArr[c]);
rule.executeRule();
}
Now if you need that the Rule interface receives a parameter,
you'll also need a similar Map object to associate the character with the parameter you need to pass.
What exactly are you building? Some sort of state machine?
cheers, good luck! :-)
Take a look at Command Pattern. You are basically asking for the answer that this question
looks like you need some sort of parser that'll give you valid tokens, or will need to jerry rig your own. Whatever the case, you'll need to look one step ahead in your array.
here's yet another implementation:
String rule = "FLL(123)FKF";
String pattern = "[a-zA-Z]{1}|[(\\d)]+"; //any single character or set of (numbers)
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(rule);
String command = "", param = "", token;
while(m.find()){
token = m.group();
if(token.length() > 1){ //must be parameter
param = token.substring(1, token.length()-1);
continue;
}
if(command != ""){
runCommand(command, param);
param = ""; //clear
}
command = token; //set next
}
if(command != "") //last trailing run
runCommand(command, param);
Also need to define runCommand:
bool runCommand(string command, string param){
System.out.println("execute function" + command + "(" + param + ")");
bool success, hasParam = (param != "");
int p = Integer.parseInt(param);
switch(command){
case "F":
success = (hasParam ? functionF() : functionF(p));
break;
case "L":
success = (hasParam ? functionL() : functionL(p));
break;
case "K":
success = (hasParam ? functionK() : functionK(p));
break;
}
return success;
}
Try a simple if statement to test whether there is a parameter or not:
workRuleArr= stringProdrule.toCharArray();
for (char c=0; c < workRuleArr.length; c++){
if (workRuleArr[c] == 'f') {
if(workRuleArr[c+1].equals("(")) {
// use the parameter
String param = "";
c++;
while(!workRuleArr[c+1].equals(")")) {
param += workRuleArr[c+1];
c++;
}
int yourParameter = Integer.parseInt(param);
functionF(yourParameter);
}
else {
functionF();
}
}
if (workRuleArr[c] == 'l'){
functionL();
}
}
Please note, I haven't tested the code, there can be some errors.
I wouldn't convert into a char array, because String has more and useful methods, also for this case.
So, for each character with index i you need to test if the char at i+1 is '(', then use int k = stringProdrule.indexOf(')', i+2) to get the closing parenthesis and pass the string between both parentheses as parameter to the function. Then continue at k+1.
I have here a String that contains the source code of a class. Now i have another String that contains the full name of a method in this class. The method name is e.g.
public void (java.lang.String test)
Now I want to retieve the source code of this method from the string with the class' source code. How can I do that? With String#indexOf(methodName) i can find the start of the method source code, but how do i find the end?
====EDIT====
I used the count curly-braces approach:
internal void retrieveSourceCode()
{
int startPosition = parentClass.getSourceCode().IndexOf(this.getName());
if (startPosition != -1)
{
String subCode = parentClass.getSourceCode().Substring(startPosition, parentClass.getSourceCode().Length - startPosition);
for (int i = 0; i < subCode.Length; i++)
{
String c = subCode.Substring(0, i);
int open = c.Split('{').Count() - 1;
int close = c.Split('}').Count() - 1;
if (open == close && open != 0)
{
sourceCode = c;
break;
}
}
}
Console.WriteLine("SourceCode for " + this.getName() + "\n" + sourceCode);
}
This works more or less fine, However, if a method is defined without body, it fails. Any hints how to solve that?
Counting braces and stopping when the count decreases to 0 is indeed the way to go. Of course, you need to take into account braces that appear as literals and should thus not be counted, e.g. braces in comments and strings.
Overall this is kind of a thankless endeavour, comparable in complexity to say, building a command line parser if you want to get it working really reliably. If you know you can get away with it you could cut some corners and just count all the braces, although I do not recommend it.
Update:
Here's some sample code to do the brace counting. As I said, this is a thankless job and there are tons of details you have to get right (in essence, you 're writing a mini-lexer). It's in C#, as this is the closest to Java I can write code in with confidence.
The code below is not complete and probably not 100% correct (for example: verbatim strings in C# do not allow spaces between the # and the opening quote, but did I know that for a fact or just forgot about it?)
// sourceCode is a string containing all the source file's text
var sourceCode = "...";
// startIndex is the index of the char AFTER the opening brace
// for the method we are interested in
var methodStartIndex = 42;
var openBraces = 1;
var insideLiteralString = false;
var insideVerbatimString = false;
var insideBlockComment = false;
var lastChar = ' '; // White space is ignored by the C# parser,
// so a space is a good "neutral" character
for (var i = methodStartIndex; openBraces > 0; ++i) {
var ch = sourceCode[i];
switch (ch) {
case '{':
if (!insideBlockComment && !insideLiteralString && !insideVerbatimString) {
++openBraces;
}
break;
case '}':
if (!insideBlockComment && !insideLiteralString && !insideVerbatimString) {
--openBraces;
}
break;
case '"':
if (insideBlockComment) {
continue;
}
if (insideLiteralString) {
// "Step out" of the string if this is the closing quote
insideLiteralString = lastChar != '\';
}
else if (insideVerbatimString) {
// If this quote is part of a two-quote pair, do NOT step out
// (it means the string contains a literal quote)
// This can throw, but only for source files with syntax errors
// I 'm ignoring this possibility here...
var nextCh = sourceCode[i + 1];
if (nextCh == '"') {
++i; // skip that next quote
}
else {
insideVerbatimString = false;
}
}
else {
if (lastChar == '#') {
insideVerbatimString = true;
}
else {
insideLiteralString = true;
}
}
break;
case '/':
if (insideLiteralString || insideVerbatimString) {
continue;
}
// TODO: parse this
// It can start a line comment, if followed by /
// It can start a block comment, if followed by *
// It can end a block comment, if preceded by *
// Line comments are intended to be handled by just incrementing i
// until you see a CR and/or LF, hence no insideLineComment flag.
break;
}
lastChar = ch;
}
// From the values of methodStartIndex and i we can now do sourceCode.Substring and get the method source
Have a look at:- Parser for C#
It recommends using NRefactory to parse and tokenise source code, you should be able to use that to navigate your class source and pick out methods.
You will have to, probably, know the sequence of the methods listed in the code file. So that, you can look for the method closing scope } which may be right above start of next method.
So you code might look like:
nStartOfMethod = String.indexOf(methodName)
nStartOfNextMethod = String.indexOf(NextMethodName)
Look for .LastIndexOf(yourMethodTerminator /*probably a}*/,...) between a string of nStartOfMethod and nStartOfNextMethod
In this case, if you dont know the sequence of methods, you might end up skipping a method in between, to find an ending brace.