I have here a String that contains the source code of a class. Now i have another String that contains the full name of a method in this class. The method name is e.g.
public void (java.lang.String test)
Now I want to retieve the source code of this method from the string with the class' source code. How can I do that? With String#indexOf(methodName) i can find the start of the method source code, but how do i find the end?
====EDIT====
I used the count curly-braces approach:
internal void retrieveSourceCode()
{
int startPosition = parentClass.getSourceCode().IndexOf(this.getName());
if (startPosition != -1)
{
String subCode = parentClass.getSourceCode().Substring(startPosition, parentClass.getSourceCode().Length - startPosition);
for (int i = 0; i < subCode.Length; i++)
{
String c = subCode.Substring(0, i);
int open = c.Split('{').Count() - 1;
int close = c.Split('}').Count() - 1;
if (open == close && open != 0)
{
sourceCode = c;
break;
}
}
}
Console.WriteLine("SourceCode for " + this.getName() + "\n" + sourceCode);
}
This works more or less fine, However, if a method is defined without body, it fails. Any hints how to solve that?
Counting braces and stopping when the count decreases to 0 is indeed the way to go. Of course, you need to take into account braces that appear as literals and should thus not be counted, e.g. braces in comments and strings.
Overall this is kind of a thankless endeavour, comparable in complexity to say, building a command line parser if you want to get it working really reliably. If you know you can get away with it you could cut some corners and just count all the braces, although I do not recommend it.
Update:
Here's some sample code to do the brace counting. As I said, this is a thankless job and there are tons of details you have to get right (in essence, you 're writing a mini-lexer). It's in C#, as this is the closest to Java I can write code in with confidence.
The code below is not complete and probably not 100% correct (for example: verbatim strings in C# do not allow spaces between the # and the opening quote, but did I know that for a fact or just forgot about it?)
// sourceCode is a string containing all the source file's text
var sourceCode = "...";
// startIndex is the index of the char AFTER the opening brace
// for the method we are interested in
var methodStartIndex = 42;
var openBraces = 1;
var insideLiteralString = false;
var insideVerbatimString = false;
var insideBlockComment = false;
var lastChar = ' '; // White space is ignored by the C# parser,
// so a space is a good "neutral" character
for (var i = methodStartIndex; openBraces > 0; ++i) {
var ch = sourceCode[i];
switch (ch) {
case '{':
if (!insideBlockComment && !insideLiteralString && !insideVerbatimString) {
++openBraces;
}
break;
case '}':
if (!insideBlockComment && !insideLiteralString && !insideVerbatimString) {
--openBraces;
}
break;
case '"':
if (insideBlockComment) {
continue;
}
if (insideLiteralString) {
// "Step out" of the string if this is the closing quote
insideLiteralString = lastChar != '\';
}
else if (insideVerbatimString) {
// If this quote is part of a two-quote pair, do NOT step out
// (it means the string contains a literal quote)
// This can throw, but only for source files with syntax errors
// I 'm ignoring this possibility here...
var nextCh = sourceCode[i + 1];
if (nextCh == '"') {
++i; // skip that next quote
}
else {
insideVerbatimString = false;
}
}
else {
if (lastChar == '#') {
insideVerbatimString = true;
}
else {
insideLiteralString = true;
}
}
break;
case '/':
if (insideLiteralString || insideVerbatimString) {
continue;
}
// TODO: parse this
// It can start a line comment, if followed by /
// It can start a block comment, if followed by *
// It can end a block comment, if preceded by *
// Line comments are intended to be handled by just incrementing i
// until you see a CR and/or LF, hence no insideLineComment flag.
break;
}
lastChar = ch;
}
// From the values of methodStartIndex and i we can now do sourceCode.Substring and get the method source
Have a look at:- Parser for C#
It recommends using NRefactory to parse and tokenise source code, you should be able to use that to navigate your class source and pick out methods.
You will have to, probably, know the sequence of the methods listed in the code file. So that, you can look for the method closing scope } which may be right above start of next method.
So you code might look like:
nStartOfMethod = String.indexOf(methodName)
nStartOfNextMethod = String.indexOf(NextMethodName)
Look for .LastIndexOf(yourMethodTerminator /*probably a}*/,...) between a string of nStartOfMethod and nStartOfNextMethod
In this case, if you dont know the sequence of methods, you might end up skipping a method in between, to find an ending brace.
Related
I have the following string :
bla {{bla {{bla bla {{afsaasg}} }} blabla}} {{bla bla}} bla
I would like to match
{{bla {{bla bla {{afsaasg}} }} blabla}}
with a regex.
but my regex
{{(.*?)}}
matches
{{bla {{bla bla}}
anyone can help ?
Additional Info : I expect to have not more then 2 brackets at the same time.
Finally I solved this with an own Java fuction. Perhabs this will help someone :
public static ArrayList<String> getRecursivePattern(String sText, String sBegin, String sEnd) {
ArrayList<String> alReturn = new ArrayList<String>();
boolean ok1 = true;
boolean ok2 = true;
int iStartCount = 0;
int iEndCount = 0;
int iStartSearching = 0;
while (ok1) {
int iAnfang = sText.indexOf(sBegin, iStartSearching);
ok2 = true;
if (iAnfang > -1) {
while (ok2) {
int iStartCharacter = sText.indexOf(sBegin, iStartSearching);
int iEndCharacter = sText.indexOf(sEnd, iStartSearching);
if (iEndCharacter == -1) {
// Nothing found . stop
ok2 = false;
ok1 = false;
} else if (iStartCharacter < iEndCharacter && iStartCharacter != -1) {
// found startpattern
iStartCount = iStartCount + 1;
iStartSearching = iStartCharacter + sBegin.length();
} else if (iStartCharacter > iEndCharacter && iEndCharacter != -1 || (iStartCharacter == -1 && iEndCharacter != -1)) {
iEndCount = iEndCount + 1;
iStartSearching = iEndCharacter + sEnd.length();
} else {
if (iStartCharacter < 0) {
// No End found . stop
ok2 = false;
}
}
if (iEndCount == iStartCount) {
// found the pattern
ok2 = false;
// cut
int iEnde = iStartSearching;// +sEnd.length();
String sReturn = sText.substring(iAnfang, iEnde);
alReturn.add(sReturn);
}
}
} else {
ok1 = false;
}
}
return alReturn;
}
I call it:
ArrayList<String> alTest=getRecursivePattern("This {{ is a {{Test}} bla }}","{{","}}");
System.out.println(" sTest : " + alTest.get(0));
.NET has special support for nested item matching, so {{(?>[^\{\}]+|\{(?<DEPTH>)|\}(?<-DEPTH>))*(?(DEPTH)(?!))}} would do what you wanted in C# to any level of nesting, but not Java.
Don't you need to escape the curly braces? I do in notepad++. Anyway, this should do it
\{\{[^{]+\{\{[^{}]+\}\}[^}]+\}\}
You can't do this with regular expressions. It the consequence of the pumping lemma. You need to use context-free grammar's, or perhaps use dedicated tools (like XML/DOM/... parsers).
You can indeed parse this for - say - three levels deep, but you can't let this work for an arbitrary number of levels. Even then, it's better to use context-free grammars (like a LALR compiler compiler), simply because "These are the tools designed to parse such structures.".
In other words, If one day, someone can enter {{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{ bla }}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}, and this is supposed to be valid, it will most likely fail.
One sidenote:
Say the level is for instance i levels deep, you can use a regex like:
for 1: .*?(.*?\{\{.*?\}\}.*?)*.*?
for 2: .*?(.*?\{\{.*?(.*?\{\{.*?\}\}.*?)*.*?\}\}.*?)*.*?
...
But as you can see, the more deep you go, the longer the regex, and there is no way to parse them for arbitrary depth.
See also this discussion for people who want to parse XML/HTML - another recursive language - with regexes.
As you noted, some regular expression toolkits indeed provide tools to count things. These can be found in the P-languages (PHP, Perl,...). These aren't regular expressions (as defined by Kleene, see this Wikipedia-article about what a real regex is) strictly speaking, but simplified parsers. Because they don't describe a regular language. And - currently - not available in most regex libraries including Java. Some of the libraries even provide Turing complete parsers, parsers than can parse anything you can parse algorithmically, but it's not really recommended for advanced tasks...
i need to develope a new methode, that should replace all Umlaute (ä, ö, ü) of a string entered with high performance with the correspondent HTML_Escapecodes. According to statistics only 5% of all strings entered contain Umlauts. As it is supposed that the method will be used extensively, any instantiation that is not necessary should be avoided.
Could someone show me a way to do it?
These are the HTML escape codes. Additionally, HTML features arbitrary escaping with codes of the format : and equivalently :
A simple string-replace is not going to be efficient with so many strings to replace. I suggest you split the string by entity matches, such as this:
String[] parts = str.split("&([A-Za-z]+|[0-9]+|x[A-Fa-f0-9]+);");
if(parts.length <= 1) return str; //No matched entities.
Then you can re-build the string with the replaced parts inserted.
StringBuilder result = new StringBuilder(str.length());
result.append(parts[0]); //First part always exists.
int pos = parts[0].length + 1; //Skip past the first entity and the ampersand.
for(int i = 1;i < parts.length;i++) {
String entityName = str.substring(pos,str.indexOf(';',pos));
if(entityName.matches("x[A-Fa-f0-9]+") && entityName.length() <= 5) {
result.append((char)Integer.decode("0" + entityName));
} else if(entityName.matches("[0-9]+")) {
result.append((char)Integer.decode(entityName));
} else {
switch(entityName) {
case "euml": result.append('ë'); break;
case "auml": result.append('ä'); break;
...
default: result.append("&" + entityName + ";"); //Unknown entity. Give the original string.
}
}
result.append(parts[i]); //Append the text after the entity.
pos += entityName.length() + parts[i].length() + 2; //Skip past the entity name, the semicolon and the following part.
}
return result.toString();
Rather than copy-pasting this code, type it in your own project by hand. This gives you the opportunity to look at how the code actually works. I didn't run this code myself, so I can't guarantee it being correct. It can also be made slightly more efficient by pre-compiling the regular expressions.
I am writing a program that is going to read a string from a file, and then remove anything that isn't 1-9 or A-Z or a-z. The A-Z values need to become lowercase. Everything seems to run fine, I have no errors, however my output is messed up. It seems to skip certain characters for no reason whatsoever. I've looked at it and tweaked it but nothing works. Can't figure out why it is randomly skipping certain characters because I believe my if statements are correct. Here is the code:
String dataIn;
int temp;
String newstring= "";
BufferedReader file = new BufferedReader(new FileReader("palDataIn.txt"));
while((dataIn=file.readLine())!=null)
{
newstring="";
for(int i=0;i<dataIn.length();i++)
{
temp=(int)dataIn.charAt(i);
if(temp>46&&temp<58)
{
newstring=newstring+dataIn.charAt(i);
}
if(temp>96&&temp<123)
{
newstring=newstring+dataIn.charAt(i);
}
if(temp>64&&temp<91)
{
newstring=newstring+Character.toLowerCase(dataIn.charAt(i));
}
i++;
}
System.out.println(newstring);
}
So to give you an example, the first string I read in is :
A sample line this is.
The output after my program runs through it is this:
asmlietis
So it is reading the A making it lowercase, skips the space like it is suppose to, reads the s in, but then for some reason skips the "a" and the "m" and goes to the "p".
You're incrementing i in the each of the blocks as well as in the main loop "header". Indeed, because you've got one i++; in an else statement for the last if statement, you're sometimes incrementing i twice during the loop.
Just get rid of all the i++; statements other than the one in the for statement declaration. For example:
newstring="";
for(int i=0;i<dataIn.length();i++)
{
temp=(int)dataIn.charAt(i);
if(temp>46&&temp<58)
{
newstring=newstring+dataIn.charAt(i);
}
if(temp>96&&temp<123)
{
newstring=newstring+dataIn.charAt(i);
}
if(temp>64&&temp<91)
{
newstring=newstring+Character.toLowerCase(dataIn.charAt(i));
}
}
I wouldn't stop editing there though. I'd also:
Use a char instead of an int as the local variable for the current character you're looking at
Use character literals for comparisons, to make it much clearer what's going on
Use a StringBuilder to build up the string
Declare the variable for the output string for the current line within the loop
Use if / else if to make it clear you're only expecting to go into one branch
Combine the two paths that both append the character as-is
Fix the condition for numbers (it's incorrect at the moment)
Use more whitespace for clarity
Specify a locale in toLower to avoid "the Turkey problem" with I
So:
String line;
while((line = file.readLine()) != null)
{
StringBuilder builder = new StringBuilder(line.length());
for (int i = 0; i < line.length(); i++) {
char current = line.charAt(i);
// Are you sure you want to trim 0?
if ((current >= '1' && current <= '9') ||
(current >= 'a' && current <= 'z')) {
builder.append(current);
} else if (current >= 'A' && current <= 'Z') {
builder.append(Character.toLowerCase(current, Locale.US));
}
}
System.out.println(builder);
}
I'm writing a Java program in which I'm checking a list against a string, and then doing stuff to that. In fortran I'd write something along the lines of
where(list(:)==stringToCheck){
...
statements
...
}
Instead I have a headache of a block of for-loops, if staments and breaks all over the place. No perhaps I could neaten the code a little but it still feels far more inefficient than fortran.
Edit, this is the code I've resorted to:
for(int idx=0;idx<player.get_charactersOwned().size();idx++)
{
if(player.get_charactersOwned().get(idx).get_characterName().equals(charName))
{
/* Add character to the game
* Add game to the character*/
System.out.println("Character "+charName+" Found ");
gameToMake.addCharacters(player.get_charactersOwned().get(idx));
player.get_charactersOwned().get(idx).addGame(gameToMake);
break;
}else
{
System.err.println("Character "+ charName +" not found");
System.out.println("Shall I add that Character? y/n ");
choice = scanner.nextLine();
if(choice.equalsIgnoreCase("y"))
{
charName = scanner.nextLine();
Character character = new Character(charName);
characterTempList.add(character);
player.addCharacter(characterTempList);
gameToMake.addCharacters(player.get_charactersOwned().get(idx));
player.get_charactersOwned().get(idx).addGame(gameToMake);
break;
}else{break;}
}
}
As tempting as it is to fix this code, I'd much rather use a work around.
Is there a Java equivilant of this without the use of external libraries?
No, there isn't an equivalent in Java. Instead if you need to check if a list of characters (each with a name) contains a character name then simply do this:
// search the name
boolean found = false;
for (Character c : player.get_charactersOwned()) {
if (c.get_characterName().equals(charName)) {
found = true;
break;
}
}
// perform the check
if (found) {
// do something
} else {
// do something else
}
And by the way, Character is a bad name for your class, it clashes with Java's own Character class. Rename it if possible, to avoid confusion. Alternatively, the loop could have been written like this:
boolean found = false;
for (int i = 0, n = player.get_charactersOwned().size(); i < n && !found; i++) {
Character c = player.get_charactersOwned().get(i);
if (c.get_characterName().equals(charName)) {
found = true;
}
}
What im doing is making a production rule in the form of a string of a finite alphabet, copy it into a char array and then run it through an if statement which calls functions depending on the character.
Such as "lff[f]" which calls functionL, functionF, functionF, functionOpenB, functionF, functionCloseB.
So currently its:
workRuleArr= stringProdrule.toCharArray();
for (char c=0; c < workRuleArr.length; c++){
if (workRuleArr[c] == 'f')
{
functionF();
}
if (workRuleArr[c] == 'l')
{
functionL();
}
etc
This is fine and working, however:
How would i pass parameters to those functions from the production rule such as "l(100)ff.." so that it would call functionL(x)... where x is = 100 and pass 100 to the function??
And there may be many different values for x in the same production rule string. The user inputs the rule in one go at the start o the program so it would need to deal with multiple parameters in the same production rule
Any ideas would be appreciated, if the question is not clear let me know. Thanks
Seems to me that those "rule functions" you have all serve a similar purpose, so they obey the same interface,
like
interface Rule{
void executeRule();
}
Then you have different Rules that implement that interface, like
class RuleF implements Rule{
void executeRule(){
//execute rule F
}
}
class RuleL implements Rule{
void executeRule(){
//execute rule L
}
}
Then you need a simple way to associate a character with a given rule.
Use a HashMap, like:
Map<Character, Rule> ruleMap = new HashMap<Character, Rule>();
ruleMap.add('f', new RuleF());
ruleMap.add('l', new RuleL());
With that you can remove all those "ifs", like
workRuleArr= stringProdrule.toCharArray();
for (char c=0; c < workRuleArr.length; c++){
Rule rule = ruleMap.get(workRuleArr[c]);
rule.executeRule();
}
Now if you need that the Rule interface receives a parameter,
you'll also need a similar Map object to associate the character with the parameter you need to pass.
What exactly are you building? Some sort of state machine?
cheers, good luck! :-)
Take a look at Command Pattern. You are basically asking for the answer that this question
looks like you need some sort of parser that'll give you valid tokens, or will need to jerry rig your own. Whatever the case, you'll need to look one step ahead in your array.
here's yet another implementation:
String rule = "FLL(123)FKF";
String pattern = "[a-zA-Z]{1}|[(\\d)]+"; //any single character or set of (numbers)
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(rule);
String command = "", param = "", token;
while(m.find()){
token = m.group();
if(token.length() > 1){ //must be parameter
param = token.substring(1, token.length()-1);
continue;
}
if(command != ""){
runCommand(command, param);
param = ""; //clear
}
command = token; //set next
}
if(command != "") //last trailing run
runCommand(command, param);
Also need to define runCommand:
bool runCommand(string command, string param){
System.out.println("execute function" + command + "(" + param + ")");
bool success, hasParam = (param != "");
int p = Integer.parseInt(param);
switch(command){
case "F":
success = (hasParam ? functionF() : functionF(p));
break;
case "L":
success = (hasParam ? functionL() : functionL(p));
break;
case "K":
success = (hasParam ? functionK() : functionK(p));
break;
}
return success;
}
Try a simple if statement to test whether there is a parameter or not:
workRuleArr= stringProdrule.toCharArray();
for (char c=0; c < workRuleArr.length; c++){
if (workRuleArr[c] == 'f') {
if(workRuleArr[c+1].equals("(")) {
// use the parameter
String param = "";
c++;
while(!workRuleArr[c+1].equals(")")) {
param += workRuleArr[c+1];
c++;
}
int yourParameter = Integer.parseInt(param);
functionF(yourParameter);
}
else {
functionF();
}
}
if (workRuleArr[c] == 'l'){
functionL();
}
}
Please note, I haven't tested the code, there can be some errors.
I wouldn't convert into a char array, because String has more and useful methods, also for this case.
So, for each character with index i you need to test if the char at i+1 is '(', then use int k = stringProdrule.indexOf(')', i+2) to get the closing parenthesis and pass the string between both parentheses as parameter to the function. Then continue at k+1.