Regex match for object creation in Java [closed] - java

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I am searching the body of the constructor(s) of classes to find out any composition association between classes. So I have decided to use the following regex to perform this matching:
String regex = "(" + "^[this.]" + attribute.getName() + "=" + "|" + "^[this.]" + attribute.getName() + " =" + "|" + "^" + attribute.getName() + "=" + "|" + "^" + attribute.getName() + "=" + ")";
A class member might be initialized in the constructor in the following formats:
this.objectName =...; (with empty space)
this.objectName=...; (without empty space)
objectName =...; (with empty space)
objectName=...; (without empty space)
However, my regex still doesn't work as I expected.
Let's assume that I have initialized Student student; in the following constructor:
public Submission(long studentID, Date dateSubmitted, float grade){
this.studentID = studentID;
this.student= new Student();
this.grade = grade;
this.dateSubmitted = dateSubmitted;
}
And my regex should be dynamically created as:
String regex = "( ^[this.]student= | ^[this.]student = | ^student= | ^student = )";
if we assume that body is the content of the Submission(..) constructor
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(body);
if(m.find()) { ((ClassStructure)abstractStructure).addAssociation(AssociationType.COMPOSITION, node.getName());
isFound = true;
}
isFound should be initialized as true at the end of the compilation. In fact, it is not.
Could you please tell me how should I re-write the regex?

Since you are searching for this keyword in the beginning of the line, you don't get the desired results. Try to change your regex with this;
String regexTwo = "( (?:this\.)?" + attribute.getName() + "= | " + "(?:this\.)?" + attribute.getName() + " = )";

I have solved the problem by the help of Andreas and Erdi Izgi. The following regex works perfectly:
String regexTwo = "( (?:this\\.)?" + attribute.getName() + "= | " + "(?:this\\.)?" + attribute.getName() + " = )";

Related

I want to delete the last two letters form a verb and conjugate it using StringBuilder [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
Everything is explained in the comments
Scanner sc=new Scanner(System.in);
//asking for the verb
System.out.println("Donnez un verbe regulier du premier groupe :");
//putting the verb in the string
String chaine=sc.nextLine();
//taking the two last elements
char first=chaine.charAt(chaine.length());
char second=chaine.charAt(chaine.length()-1);
//creating a StringBuilder:
StringBuilder sb=new StringBuilder();
//putting string in the builder :
sb.append(chaine);
//deleting the two last characters :
sb.deleteCharAt(sb.length()-1);
//printing elemnts :
System.out.println("Je "+sb.append("e"));
System.out.println("Tu "+sb.append("es"));
System.out.println("Il/Elle "+sb.append("e"));
System.out.println("nous "+sb.append("ons"));
System.out.println("vous "+sb.append("ez"));
System.out.println("Ils/Elles "+sb.append("ent"));
I got StringIndexOutOfBoundsException.
I can help more according to your questions.
String radical = chaine.replaceFirst("..$", "");
System.out.println("Je " + radical + "e");
System.out.println("Tu " + radical + "es");
System.out.println("Il/Elle " + radical + "e");
System.out.println("Nous " + radical + "ons");
System.out.println("Vous " + radical + "ez");
System.out.println("Ils/Elles " + radical + "ent");
This uses a regular expression replaceFirst. The pattern:
. any character
. any character
$ end of string
Hence: the last two letters are replaced by the empty string.
The minor advantage over chaine.substring(0, chaine.length() - 2)
is that for the empty string or one-letter string no indexing error happens;
it does no replacing. Admittedly substring is faster.
Your code is full of errors.
First:
char first=chaine.charAt(chaine.length());
char second=chaine.charAt(chaine.length()-1);
Will throw an exception because the last character of chaine is chaine.length()-1. So you should have written:
char first=chaine.charAt(chaine.length()-1);
char second=chaine.charAt(chaine.length()-2);
or perhaps:
char first=chaine.charAt(chaine.length()-2);
char second=chaine.charAt(chaine.length()-1);
but you don't seem to do anything with first and second.
Next, StringBuilder.append will append the parameter to the builder, I don't think that's what you wont.
Perhaps you wanted to do:
String prefix = chaine.substring(0, chaine.length()-2);
System.out.println("Je " + prefix + "e");
System.out.println("Tu " + prefix + "es");
System.out.println("Il/Elle " + prefix + "e");
System.out.println("Nous " + prefix + "ons");
System.out.println("Vous " + prefix + "ez");
System.out.println("Ils/Elles " + prefix + "ent");

java Dangling meta character '+' [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I have text:
String str = "<HTML> <HEAD>\n" +
"<TITLE>Управление разрывом строк </Title>\n" +
"</HEAD>\n" +
"<BODY>\n" +
"<marquee>Это моя учебная страница.</marquee>\n" +
"<H2>Меня зовут <kbd>Ольга. </kbd></H2>\n" +
"<H3 align=\"center\">Я живу во <em>Владивостоке</em>.</H3>\n" +
"<H4 align=\"right\">Моя маленькая родина - <font face=\"Academy\" color=\"Red\">Сахалин</font>. </H4>\n" +
"<H5 align=left>ДВГУ - ВУЗ в котором я работаю.</H5>\n" +
"<B>Здесь</B>\n" +
"<I>продемонстрированы</I>\n" +
"<Blink>различные</Blink>\n" +
"<U> способы </U>\n" +
"<KBD>управления </KBD>\n" +
"<FONT SIZE=5 COLOR=FF80C0>шрифтом:</FONT> его\n" +
"<FONT SIZE=5 COLOR=FF00FF>цветом</FONT> и\n" +
"<FONT SIZE=+3 COLOR=FF00FF>размером.</FONT>\n" +
"</BODY> </HTML>";
I write regexp (?<=(=))[+a-zA-Z0-9]+(?=(>| )) that find manches
left
5
FF80C0
5
FF00FF
+3
FF00FF
But java throw exception
Exception in thread "main" java.util.regex.PatternSyntaxException: Dangling meta character '+' near index 0
+3
^
at java.util.regex.Pattern.error(Pattern.java:1955)
at java.util.regex.Pattern.sequence(Pattern.java:2123)
...
Ok. I try shield + ((?<=(=))[\\+a-zA-Z0-9]+(?=(>| )))
But this no work(is the same error). Why?
code:
Matcher matcher = Pattern.compile("(?<=(=))[+a-zA-Z0-9]+(?=(>| ))").matcher(str);
while (matcher.find()) {
str= str.replaceAll(matcher.group(),'"' + matcher.group() + '"');
}
You error has nothing to do with the shown regex.
The problem is because you use the matched result values as a parameter to replaceAll(), and those parameters are also regular expressions.
Since you don't want them to be interpreted as regex, you need to escape them, or rather "quote" them, like this:
str = str.replaceAll(Pattern.quote(matcher.group()),
Matcher.quoteReplacement('"' + matcher.group() + '"'));
UPDATE
However, if you just want to put double-quotes around the matched strings, why don't you just use replaceAll() directly? Like this:
str = str.replaceAll("(?<==)([+a-zA-Z0-9]+)(?=[> ])", "\"$1\"");
Your issue does not come from your pattern, it comes from
replaceAll(...)
ReplaceAll takes a regex in input.
Your input comes from your str, and at some point it is "+3", which contain a dangling +...

Digits are getting deleted when splitting a string

I have a string from which I need to remove all mentioned punctuations and spaces. My code looks as follows:
String s = "s[film] fever(normal) curse;";
String[] spart = s.split("[,/?:;\\[\\]\"{}()\\-_+*=|<>!`~##$%^&\\s+]");
System.out.println("spart[0]: " + spart[0]);
System.out.println("spart[1]: " + spart[1]);
System.out.println("spart[2]: " + spart[2]);
System.out.println("spart[3]: " + spart[3]);
I have a string from which I need to remove all mentioned punctuations and spaces. My code looks as follows:
String s = "s[film] fever(normal) curse;";
String[] spart = s.split("[,/?:;\\[\\]\"{}()\\-_+*=|<>!`~##$%^&\\s+]");
System.out.println("spart[0]: " + spart[0]);
System.out.println("spart[1]: " + spart[1]);
System.out.println("spart[2]: " + spart[2]);
System.out.println("spart[3]: " + spart[3]);
But, I am getting some elements which are blank. The output is:
spart[0]: s
spart[1]: film
spart[2]:
spart[3]: normal
- is a special character in PHP character classes. For instance, [a-z] matches all chars from a to z inclusive. Note that you've got )-_ in your regex.
- defines a range in regular expressions as used by String.split argument so that needs to be escaped
String[] part = line.toLowerCase().split("[,/?:;\"{}()\\-_+*=|<>!`~##$%^&]");
String[] spart = s.split("[,/?:;\\[\\]\"{}()\\-_+*=|<>!`~##$%^&\\s]+");

Extracting Capture Group from Non-Capture Group in Java

I have a string, let's call it output, that's equals the following:
ltm data-group internal str_testclass {
records {
baz {
data "value 1"
}
foobar {
data "value 2"
}
topaz {}
}
type string
}
And I'm trying to extract the substring between the quotes for a given "record" name. So given foobar I want to extract value 2. The substring I want to extract will always come in the form I have prescribed above, after the "record" name, a whitespace, an open bracket, a new line, whitespace, the string data, and then the substring I want to capture is between the quotes from there. The one exception is when there is no value, which will always happen like I have prescribed above with topaz, in which case after the "record" name there will just be an open and closed bracket and I'd just like to get an empty string for this. How could I write a line of Java to capture this? So far I have ......
String myValue = output.replaceAll("(?:foobar\\s{\n\\s*data "([^\"]*)|()})","$1 $2");
But I'm not sure where to go from here.
Let's start extracting "records" structure with following regex ltm\s+data-group\s+internal\s+str_testclass\s*\{\s*records\s*\{\s*(?<records>([^\s}]+\s*\{\s*(data\s*"[^"]*")?\s*\}\s*)*)\}\s*type\s*string\s*\}
Then from "records" group, just find for sucessive match against [^\s}]+\s*\{\s*(?:data\s*"(?<data>[^"]*)")?\s*\}\s*. The "data" group contains what's you're looking for and will be null in "topaz" case.
Java strings:
"ltm\\s+data-group\\s+internal\\s+str_testclass\\s*\\{\\s*records\\s*\\{\\s*(?<records>([^\\s}]+\\s*\\{\\s*(data\\s*\"[^\"]*\")?\\s*\\}\\s*)*)\\}\\s*type\\s*string\\s*\\}"
"[^\\s}]+\\s*\\{\\s*(?:data\\s*\"(?<data>[^\"]*)\")?\\s*\\}\\s*"
Demo:
String input =
"ltm data-group internal str_testclass {\n" +
" records {\n" +
" baz {\n" +
" data \"value 1\"\n" +
" }\n" +
" foobar {\n" +
" data \"value 2\"\n" +
" }\n" +
" topaz {}\n" +
" empty { data \"\"}\n" +
" }\n" +
" type string\n" +
"}";
Pattern language = Pattern.compile("ltm\\s+data-group\\s+internal\\s+str_testclass\\s*\\{\\s*records\\s*\\{\\s*(?<records>([^\\s}]+\\s*\\{\\s*(data\\s*\"[^\"]*\")?\\s*\\}\\s*)*)\\}\\s*type\\s*string\\s*\\}");
Pattern record = Pattern.compile("(?<name>[^\\s}]+)\\s*\\{\\s*(?:data\\s*\"(?<data>[^\"]*)\")?\\s*\\}\\s*");
Matcher lgMatcher = language.matcher(input);
if (lgMatcher.matches()) {
String records = lgMatcher.group();
Matcher rdMatcher = record.matcher(records);
while (rdMatcher.find()) {
System.out.printf("%s:%s%n", rdMatcher.group("name"), rdMatcher.group("data"));
}
} else {
System.err.println("Language not recognized");
}
Output:
baz:value 1
foobar:value 2
topaz:null
empty:
Alernatives: As your parsing a custom language, you can give a try to write an ANTLR grammar or create Groovy DSL.
Your regex shouldn't even compile, because you are not escaping the " inside your regex String, so it is ending your String at the first " inside your regex.
Instead, try this regex:
String regex = key + "\\s\\{\\s*\\n\\s*data\\s*\"([^\"]*)\"";
You can check out how it works here on regex101.
Try something like this getRecord() method where key is the record 'name' you're searching for, e.g. foobar, and the input is the string you want to search through.
public static void main(String[] args) {
String input = "ltm data-group internal str_testclass { \n" +
" records { \n" +
" baz { \n" +
" data \"value 1\" \n" +
" } \n" +
" foobar { \n" +
" data \"value 2\" \n" +
" }\n" +
" topaz {}\n" +
" } \n" +
" type string \n" +
"}";
String bazValue = getRecord("baz", input);
String foobarValue = getRecord("foobar", input);
String topazValue = getRecord("topaz", input);
System.out.println("Record data value for 'baz' is '" + bazValue + "'");
System.out.println("Record data value for 'foobar' is '" + foobarValue + "'");
System.out.println("Record data value for 'topaz' is '" + topazValue + "'");
}
private static String getRecord(String key, String input) {
String regex = key + "\\s\\{\\s*\\n\\s*data\\s*\"([^\"]*)\"";
final Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
//if we find a record with data return it
return matcher.group(1);
} else {
//else see if the key exists with empty {}
final Pattern keyPattern = Pattern.compile(key);
Matcher keyMatcher = keyPattern.matcher(input);
if (keyMatcher.find()) {
//return empty string if key exists with empty {}
return "";
} else {
//else handle error, throw exception, etc.
System.err.println("Record not found for key: " + key);
throw new RuntimeException("Record not found for key: " + key);
}
}
}
Output:
Record data value for 'baz' is 'value 1'
Record data value for 'foobar' is 'value 2'
Record data value for 'topaz' is ''
You could try
(?:foobar\s{\s*data "(.*)")
I think the replaceAll() isn't necessary here. Would something like this work:
String var1 = "foobar";
String regex = '(?:' + var1 + '\s{\n\s*data "([^"]*)")';
You can then use this as your regex to pass into your pattern and matcher to find the substring.
You can simple transform this into a function so that you can pass variables into it for your search string:
public static void SearchString(String str)
{
String regex = '(?:' + str + '\s{\n\s*data "([^"]*)")';
}

Avoid overlapping regex matching in Java

For some reason this piece of Java code is giving me overlapping matches:
Pattern pat = Pattern.compile("(" + leftContext + ")" + ".*" + "(" + rightContext + ")", Pattern.DOTALL);
any way/option so it avoids detecting overlaps? e.g.
leftContext rightContext rightContext
should be be 1 match instead of 2
Here's the complete code:
public static String replaceWithContext(String input, String leftContext, String rightContext, String newString){
Pattern pat = Pattern.compile("(" + leftContext + ")" + ".*" + "(" + rightContext + ")", Pattern.DOTALL);
Matcher matcher = pat.matcher(input);
StringBuffer buffer = new StringBuffer();
while (matcher.find()) {
matcher.appendReplacement(buffer, "");
buffer.append(matcher.group(1) + newString + matcher.group(2));
}
matcher.appendTail(buffer);
return buffer.toString();
}
So here's the final answer using a negative lookahead, my bad for not realizing * was greedy:
Pattern pat = Pattern.compile("(" +
leftContext + ")" + "(?:(?!" +
rightContext + ").)*" + "(" +
rightContext + ")", Pattern.DOTALL);
Your use of the word "overlapping" is confusing. Apparently, what you meant was that the regex is too greedy, matching everything from the first leftContext to the last rightContext. It seems you figured that out already--and came up with a better approach as well--but there's still at least one potential problem.
You said leftContext and rightContext are "plain Strings", by which I assume you meant they aren't supposed to be interpreted as regexes, but they will be. You need to escape them, or any regex metacharacters they contain will cause incorrect results or run-time exceptions. The same goes for your replacement string, although only $ and the backslash have special meanings there. Here's an example (notice the non-greedy .*?, too):
public static String replaceWithContext(String input, String leftContext, String rightContext, String newString){
String lcRegex = Pattern.quote(leftContext);
String rcRegex = Pattern.quote(rightContext);
String replace = Matcher.quoteReplacment(newString);
Pattern pat = Pattern.compile("(" + lcRegex + ").*?(" + rcRegex + ")", Pattern.DOTALL);
One other thing: if you aren't doing any post-match processing on the matched text, you can use replaceAll instead of rolling your own with appendReplacement and appendTail:
return input.replaceAll("(?s)(" + lcRegex + ")" +
"(?:(?!" + rcRegex + ").)*" +
"(" + rcRegex + ")",
"$1" + replace + "$2");
There are few possibilities, depending on what you really need.
You can append $ at the end of your regex, like this:
"(" + leftContext + ")" + ".*" + "(" + rightContext + ")$"
so if rightContext isn't the last thing, your regex won't match.
Next, you can capture everything after rightContext:
"(" + leftContext + ")" + ".*" + "(" + rightContext + ")(.*)"
and after that discard everything in your third matching group.
But, since we don't know what leftContext and rightContext really are, maybe your problem lies within them.

Categories

Resources