Extract Strings between parenthese from String

Extract Strings between parenthese from String - java

I am working on my own little Script Language. Now I am stucking at the Script Reader which reads the file and converts it into the Script Parts (variables, functions, if clauses etc). I hava String which saves the Script File content in one line like so:
mega = "";test(){hallo();test = "";if(){p2 = -1;}else{p1 = "";}hi = 0;}
Now my Problem. I want to get the strings between the parentheses for if clauses and the function (which is around the if clause). I tried it with regex:
Matcher m = Pattern.compile("\\(([^)]+)\\)").matcher(example);
while(m.find()) {
System.out.println(m.group(1));
}
But then I get the content between the first parenthese of the function test(){ and the first parenthese that clauses the if clause. How can I extract the Strings between the if clause, else clause and the function?
EDIT:
This is want I want to get out after the extraction:
For the If: p2 = -1;
For the Else: p1 = "";
And for the Function arround:
hallo();test = "";if(){p2 = -1;}else{p1 = "";}hi = 0;
EDIT 2:
I would like to have it recursive to its endless (when in the if clause is another etc.)

I think that regular expressions will not do that. You are looking after correspondent pairs of curly brackets.
Reminds me of RegEx match open tags except XHTML self-contained tags.

Related

(JAVA) If replacing previous if statement

Quick question. I have the following code:
//Cookie
if (mType.getText().toString().toLowerCase().contains("Cookie")) {
mResults.setText(mType.getText().toString().replaceAll("(?i)\\bCookie\\b", "Dough"));
}
//Cola
if (mType.getText().toString().toLowerCase().contains("Cola")) {
mResults.setText(mType.getText().toString().replaceAll("(?i)\\bCola\\b", "Sprite"));
}
I have a script which allows me to replace certain words from user input. When a user inputs the word "Cookie", the script replaces the word with "Dough". If, however, the user types in "Cookie Cola", the script only replaces the word "Cola" with "Sprite" and discards the previous if statement which replaces the word "Cookie".
So, how can I create a script which allows me to replace multiple words from input without discarding previous if statements?

You can just store it into a variable.
like:
String value = (mType.getText().toString().toLowerCase();
if(value.contains("soda")) { value = value.replaceAll("soda","cola"); }
if(...) {...}
But. Technically you are doing more work than necessary. You can chain your replacements.
String value = (mType.getText().toString().toLowerCase();
value = value.replaceAll("soda","cola").replaceAll("...","...")... //Etc.

To answer your question- in both ifs, you're calling replace on mType.getText(). But you aren't changing that value. So you're doing the second replace on the original string. Here's how the code should look like:
String text = mType.getText().toString();
text = text.replaceAll("(?i)\\bCookie\\b", "Dough");
text = text.replaceAll("(?i)\\bCola\\b", "Sprite"));
mResults.setText(text);

Chaining together multiple String.replaceAll() statements can achieve the same result without the need for if statements. For example
String result = mType.getText.toString().toLowerCase();
result = result.replaceAll("cola", "sprite")
.replaceAll("cookie", "dough"); // And so on...
mResults.setText(result);

Regular Expression not working in Java

I have the string "SELECTOR('(namespace=''ar.com.osde.reintegros'' and eventname=''generacionReintegro'')')".
I need return the string that is between "SELECTOR(" and ")", the result should be: '(namespace=''ar.com.osde.reintegros'' and eventname=''generacionReintegro'')'.
My code with the regular expression is:
public static String extractSelector(String txt){
Pattern pattern = Pattern.compile("^SELECTOR\\((.*)\\)$");
Matcher m = pattern.matcher(txt);
String s = null;
while (m.find()) {
s = m.group(1);
}
return s;
}
Where txt = "SELECTOR('(namespace=''ar.com.osde.reintegros'' and eventname=''generacionReintegro'')')"
But this always return null. Why is that?

This jdoodle demonstrates that that part of the code works (only added a declaration for txt and removed the return statement). Are you sure there is nothing done with the txt before that?
Given your code, the only explanation is that txt does not match the pattern fully... This because the while-loop is not executed at all (thus find is always false).
Perhaps there are spaces between SELECTOR and the opening bracket (, perhaps SELECTOR can be put in lowercase. In order to debug this, you can perhaps write
System.out.println("\""+txt+"\"");
Before actually matching it, or use the debugger features of your IDE to do this. The quotes are used to make the spaces more explicit.

How to best strip out certain strings in a file?

If I have a file with the following content:
11:17 GET this is my content #2013
11:18 GET this is my content #2014
11:19 GET this is my content #2015
How can I use a Scanner and ignore certain parts of a `String line = scanner.nextLine();?
The result that I like to have would be:
this is my content
this is my content
this is my content
So I'd like to trip everything from the start until GET, and then take everything until the # char.
How could this easily be done?

You can use the String.indexOf(String str) and String.indexOf(char ch) methods. For example:
String line = scanner.nextLine();
int start = line.indexOf("GET");
int end = line.indexOf('#');
String result = line.substring(start + 4, end);

One way might be
String strippedStart = scanner.nextLine().split(" ", 3)[2];
String result = strippedStart.substring(0, strippedStart.lastIndexOf("#")).trim();
This assumes the are always two space separated tokens at the beginning (11:22 GET or 11:33 POST, idk).

You could do something like this:-
String line ="11:17 GET this is my content #2013";
int startIndex = line.indexOf("GET ");
int endIndex = line.indexOf("#");
line = line.substring(startIndex+4, endIndex-1);
System.out.println(line);

In my opinion the best solution for your problem would be using Java regex. Using regex you can define which group or groups of text you want to retrieve and what kind of text comes where. I haven't been working with Java in a long time, so I'll try to help you out from the top of my head. I'll try to give you a point in the right direction.
First off, compile a pattern:
Pattern pattern = Pattern.compile("^\d{1,2}:\d{1,2} GET (.*?) #\d+$", Pattern.MULTILINE);
First part of the regex says that you expect one or two digits followed by a colon followed by one or two digits again. After that comes the GET (you can use GET|POST if you expect those words or \w+? if you expect any word). Then you define the group you want with the parentheses. Lastly, you put the hash and any number of digits with at least one digit. You might consider putting flags DOTALL and CASE_INSENSITIVE, although I don't think you'll be needing them.
Then you continue with the matcher:
Matcher matcher = pattern.matcher(textToParse);
while (matcher.find())
{
//extract groups here
String group = matcher.group(1);
}
In the while loop you can use matcher.group(1) to find the text in the group you selected with the parentheses (the text you'd like extracted). matcher.group(0) gives the entire find, which is not what you're currently looking for (I guess).
Sorry for any errors in the code, it has not been tested. Hope this puts you on the right track.

You can try this rather flexible solution:
Scanner s = new Scanner(new File("data"));
Pattern p = Pattern.compile("^(.+?)\\s+(.+?)\\s+(.*)\\s+(.+?)$");
Matcher m;
while (s.hasNextLine()) {
m = p.matcher(s.nextLine());
if (m.find()) {
System.out.println(m.group(3));
}
}
This piece of code ignores first, second and last words from every line before printing them.
Advantage is that it relies on whitespaces rather than specific string literals to perform the stripping.

Regular expression, value in between quotes

I'm having a little trouble constructing the regular expression using java.
The constraint is, I need to split a string seperated by !. The two strings will be enclosed in double quotes.
For example:
"value"!"value"
If I performed a java split() on the string above, I want to get:
value
value
However the catch is value can be any characters/punctuations/numerical character/spaces/etc..
So here's a more concrete example. Input:
""he! "l0"!"wor!"d1"
Java's split() should return:
"he! "l0
wor!"d1
Any help is much appreciated. Thanks!

Try this expression: (".*")\s*!\s*(".*")
Although it would not work with split, it should work with Pattern and Matcher and return the 2 strings as groups.
String input = "\" \"he\"\"\"\"! \"l0\" ! \"wor!\"d1\"";
Pattern p = Pattern.compile("(\".*\")\\s*!\\s*(\".*\")");
Matcher m = p.matcher(input);
if(m.matches())
{
String s1 = m.group(1); //" "he""""! "l0"
String s2 = m.group(2); //"wor!"d1"
}
Edit:
This would not work for all cases, e.g. "he"!"llo" ! "w" ! "orld" would get the wrong groups. In that case it would be really hard to determine which ! should be the separator. That's why often rarely used characters are used to separate parts of a string, like # in email addresses :)

have the value split on "!" instead of !
String REGEX = "\"!\"";
String INPUT = "\"\"he! \"l0\"!\"wor!\"d1\"";
String[] items = p.split(INPUT);

It feels like you need to parse on:
DOUBLEQUOTE = "
OTHER = anything that isn't a double quote
EXCLAMATION = !
ITEM = (DOUBLEQUOTE (OTHER | (DOUBLEQUOTE OTHER DOUBLEQUOTE))* DOUBLEQUOTE
LINE = ITEM (EXCLAMATION ITEM)*
It feels like it's possible to create a regular expression for the above (assuming the double quotes in an ITEM can't be nested even further) BUT it might be better served by a very simple grammer.
This might work... excusing missing escapes and the like
^"([^"]*|"[^"]*")*"(!"([^"]*|"[^"]*")*")*$
Another option would be to match against the first part, then, if there's a !and more, prune off the ! and keep matching (excuse the no-particular-language, I'm just trying to illustrate the idea):
resultList = []
while(string matches \^"([^"]*|"[^"]*")*(.*)$" => match(1)) {
resultList += match
string = match(2)
if(string.beginsWith("!")) {
string = string[1:end]
} elseif(string.length > 0) {
// throw an error, since there was no exclamation and the string isn't done
}
}
if(string.length > 0) {
// throw an exception since the string isn't done
}
resultsList == the list of items in the string
EDIT: I realized that my answer doesn't really work. You can have a single doublequote inside the strings, as well as exclamation marks. As such, you really CAN'T have "!" inside one of the strings. As such, the idea of 1) pull quotes off the ends, 2) split on '"!"' is really the right way to go.

How to create article spinner regex in Java?

Say for example I want to take this phrase:
{{Hello|What's Up|Howdy} {world|planet} |
{Goodbye|Later}
{people|citizens|inhabitants}}
and randomly make it into one of the following:
Hello world
Goodbye people
What's Up word
What's Up planet
Later citizens
etc.
The basic idea is that enclosed within every pair of braces will be an unlimited number of choices separated by "|". The program needs to go through and randomly choose one choice for each set of braces. Keep in mind that braces can be nested endlessly within each other. I found a thread about this and tried to convert it to Java, but it did not work. Here is the python code that supposedly worked:
import re
from random import randint
def select(m):
choices = m.group(1).split('|')
return choices[randint(0, len(choices)-1)]
def spinner(s):
r = re.compile('{([^{}]*)}')
while True:
s, n = r.subn(select, s)
if n == 0: break
return s.strip()
Here is my attempt to convert that Python code to Java.
public String generateSpun(String text){
String spun = new String(text);
Pattern reg = Pattern.compile("{([^{}]*)}");
Matcher matcher = reg.matcher(spun);
while (matcher.find()){
spun = matcher.replaceFirst(select(matcher.group()));
}
return spun;
}
private String select(String m){
String[] choices = m.split("|");
Random random = new Random();
int index = random.nextInt(choices.length - 1);
return choices[index];
}
Unfortunately, when I try to test this by calling
generateAd("{{Hello|What's Up|Howdy} {world|planet} | {Goodbye|Later} {people|citizens|inhabitants}}");
In the main of my program, it gives me an error in the line in generateSpun where Pattern reg is declared, giving me a PatternSyntaxException.
java.util.regex.PatternSyntaxException: Illegal repetition
{([^{}]*)}
Can someone try to create a Java method that will do what I am trying to do?

Here are some of the problems with your current code:
You should reuse your compiled Pattern, instead of Pattern.compile every time
You should reuse your Random, instead of new Random every time
Be aware that String.split is regex-based, so you must split("\\|")
Be aware that curly braces in Java regex must be escaped to match literally, so Pattern.compile("\\{([^{}]*)\\}");
You should query group(1), not group() which defaults to group 0
You're using replaceFirst wrong, look up Matcher.appendReplacement/Tail instead
Random.nextInt(int n) has exclusive upper bound (like many such methods in Java)
The algorithm itself actually does not handle arbitrarily nested braces properly
Note that escaping is done by preceding with \, and as a Java string literal it needs to be doubled (i.e. "\\" contains a single character, the backslash).
Attachment
Source code and output with above fix but no major change to algorithm

To fix the regex, add backslashes before the outer { and }. These are meta-characters in Java regexes. However, I don't think that will result in a working program. You are modifying the variable spun after it has been bound to the regex, and I do not think the returned Matcher will reflect the updated value.
I also don't think the python code will work for nested choices. Have you actually tried the python code? You say it "supposedly works", but it would be wise to verify that before you spend a lot of time porting it to Java.

Well , I just created one in PHP & Python , demo here http://spin.developerscrib.com , its at a very early stage so might not work to expectation , the source code is on github : https://github.com/razzbee/razzy-spinner

Use this, will work... I did, and working great
Pattern p = Pattern.compile("cat");
Matcher m = p.matcher("one cat two cats in the yard");
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, "dog");
}
m.appendTail(sb);
System.out.println(sb.toString());
and here
private String select(String m){
String[] choices = m.split("|");
Random random = new Random();
int index = random.nextInt(choices.length - 1);
return choices[index];
}
m.split("|") use m.split("\\|")
Other wise it splits each an every character
and use Pattern.compile("\\{([^{}]*)\\}");

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Extract Strings between parenthese from String - java

I think that regular expressions will not do that. You are looking after correspondent pairs of curly brackets. Reminds me of RegEx match open tags except XHTML self-contained tags.

Related

(JAVA) If replacing previous if statement

Regular Expression not working in Java

How to best strip out certain strings in a file?

Regular expression, value in between quotes

How to create article spinner regex in Java?

Categories

Resources