Matching columns containing aggregates with regex

Matching columns containing aggregates with regex - java

I'm trying to design a regular expression to identify certain columns in the string. This is the input string -
GENDER = Y OR (SUM(TOTAL_AMOUNT) > 100 AND SUM(TOTAL_AMOUNT) < 600)
I'm trying to match SUM(TOTAL_AMOUNT) from above string.
This is the regex I've tried:
SUM([a-zA-Z])
But its not able to match properly. Could someone tell me what I'm doing wrong with my regex here. Thanks in advance.
Sample Code:
List<String> input = new ArrayList<>();
Matcher m = Pattern.compile("SUM([a-zA-Z])").matcher(str);
while (m.find())
input.add(m.group(1));

You can use
String str = "GENDER = Y OR (SUM(TOTAL_AMOUNT) > 100 AND SUM(TOTAL_AMOUNT) < 600)";
Matcher matcher = Pattern.compile("SUM\\([^()]+\\)").matcher(str);
List<String> input = new ArrayList<>();
while (matcher.find()) {
input.add(matcher.group());
}
System.out.println(input);
See the Java demo online. See the regex demo, too. It matches
SUM\( - a SUM( string
[^()]+ - one or more chars other than ( and )
\) - a ) char.
Note that I am using matcher.group() in the code to get the full match since there is no capturing group in the pattern (thus, you can't use matcher.group(1) here).

Related

Extracting a group from matched String in Java using regex

I have a list of String containing values like this:
String [] arr = {"${US.IDX_CA}", "${UK.IDX_IO}", "${NZ.IDX_BO}", "${JP.IDX_TK}", "${US.IDX_MT}", "more-elements-with-completely-different-patterns-which-is-irrelevant"};
I'm trying to extract all the IDX_XX from this list. So from above list, i should have, IDX_CA, IDX_IO, IDX_BO etc using regex in Java
I wrote following code:
Pattern pattern = Pattern.compile("(.*)IDX_(\\w{2})");
for (String s : arr){
Matcher m = pattern.matcher(s);
if (m.matches()){
String extract = m.group(1);
System.out.println(extract);
}
}
But this does not print anything. Can someone please tell me what mistake am i making. Thanks.

Use the following fix:
String [] arr = {"${US.IDX_CA}", "${UK.IDX_IO}", "${NZ.IDX_BO}", "${JP.IDX_TK}", "${US.IDX_MT}", "more-elements-with-completely-different-patterns-which-is-irrelevant"};
Pattern pattern = Pattern.compile("\\bIDX_(\\w{2})\\b");
for (String s : arr){
Matcher m = pattern.matcher(s);
while (m.find()){
System.out.println(m.group(0)); // Get the whole match
System.out.println(m.group(1)); // Get the 2 chars after IDX_
}
}
See the Java demo, output:
IDX_CA
CA
IDX_IO
IO
IDX_BO
BO
IDX_TK
TK
IDX_MT
MT
NOTES:
Use \bIDX_(\w{2})\b pattern that matches IDX_ and 2 word chars in between word boundaries and captures the 2 chars after IDX_ into Group 1
m.matches needs a full string match, so it is replaced with m.find()
if replaced with while in case there are more than 1 match in a string
m.group(0) contains the whole match values
m.group(1) contains the Group 1 values.

Regex Fetch value from string

I am very new to Regex.
I have String from which i tried fetching value.
String conditionExpression= "{action==\"Submit\" && orgType== \"supply\"}";
Matcher matcher = Pattern.compile("(?<=orgType==)\"[^\"]+\"").matcher(conditionExpression);
if (matcher.find()) {
orgType = matcher.group().replaceAll("\"", "");
}
Input will be String : "{action=="Submit" && orgType== "supply"}"
Output will be value of orgType: supply
Tried fetching orgType using Regex , but its returning null. Anything wrong here?

You need to account for whitespace that may appear around the equals sign. Besides, there is no need to post-process the match value if you use a capturing group around [^"]+.
Here is a fixed code:
String orgType = "";
String conditionExpression= "{action==\"Submit\" && orgType== \"supply\"}";
Matcher matcher = Pattern.compile("orgType\\s*==\\s*\"([^\"]*)\"").matcher(conditionExpression);
if (matcher.find()) {
orgType = matcher.group(1);
}
System.out.println(orgType); // => supply
See the Java demo
The \\s*==\\s* part of the pattern matches == enclosed with 0+ whitespace chars.
The ([^\"]*) pattern is a capturing group that pushes a submatch value into Group 1 that you can retrieve via matcher.group(1) (no need to remove double quotes later).

Regex matcher - No match found

I am trying to use Regex to extract the values from a string and use them for the further processing.
The string I have is :
String tring =Format_FRMT: <<<$gen>>>(((valu e))) <<<$gen>>>(((value 13231)))
<<<$gen>>>(((value 13231)))
Regex pattern I have made is :
Pattern p = Pattern.compile("\\<{3}\\$([\\w ]+)\\>{3}\\s?\\({3}([\\w ]+)\\){3}");
When I am running the whole program
Matcher m = p.matcher(tring);
String[] try1 = new String[m.groupCount()];
for(int i = 1 ; i<= m.groupCount();i++)
{
try1[i] = m.group(i);
//System.out.println("group - i" +try1[i]+"\n");
}
I am getting
No match found
Can anybody help me with this? where exactly this is going wrong?
My first aim is just to see whether I am able to get the values in the corresponding groups or not. and If that is working fine then I would like to use them for further processing.
Thanks

Here is an exaple of how to get all the values you need with find():
String tring = "CHARDATA_FRMT: <<<$gen>>>(((valu e))) <<<$gen>>>(((value 13231)))\n<<<$gen>>>(((value 13231)))";
Pattern p = Pattern.compile("<{3}\\$([\\w ]+)>{3}\\s?\\({3}([\\w ]+)\\){3}");
Matcher m = p.matcher(tring);
while (m.find()){
System.out.println("Gen: " + m.group(1) + ", and value: " + m.group(2));
}
See IDEONE demo
Note that you do not have to escape < and > in Java regex.

After you create the Matcher and before you reference its groups, you must call one of the methods that attempts the actual match, like find, matches, or lookingAt. For example:
Matcher m = p.matcher(tring);
if (!m.find()) return; // <---- Add something like this
String[] try1 = new String[m.groupCount()];
You should read the javadocs on the Matcher class to decide which of the above methods makes sense for your data and application. http://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html

Java regex except combination of symbols

I'm trying to find substing contains any character, but not include combination "[%"
As examples:
Input: atrololo[%trololo
Output: atrololo
Input: tro[tro%tro[%trololo
Output: tro[tro%tro
I already wrote regex, take any symbol except [ or %:
[A-Za-z-0-9\s!-$-&/:-#\\-`\{-~]*
I must put in the end of my expression something like [^("[%")], but i can't solve how it should input.
You may check my regular in
https://www.regex101.com/
Put as test string this:
sdfasdsdfasa##!55#321!2h/ хf[[[[[sds d
asgfdgsdf[[[%for (int i = 0; i < 5; i++){}%]
[% fo%][%r(int i = 0; i < 5; i++){ %]*[%}%]
[%for(int i = 0; i < 5; i++){%][%=i%][%}%]
[%#n%]<[%# n + m %]*[%#%]>[%#%]
%?s.equals(""TEST"")%]TRUE[%#3%]![%#%][%?%]
Kind regards.

You could use a negative lookahead based regex like below to get the part before the [%
^(?:(?!\[%).)*
(?:(?!\[%).)* matches any character but not of [% zero or more times.
DEMO
String s = "tro[tro%tro[%trololo";
Pattern regex = Pattern.compile("^(?:(?!\\[%).)*");
Matcher matcher = regex.matcher(s);
while(matcher.find()){
System.out.println(matcher.group()); // output : tro[tro%tro
}
OR
A lookahead based regex,
^.*?(?=\[%)
DEMO
Pattern regex = Pattern.compile("^.*?(?=\\[%)");
OR
You could split the input string based on the regex \[% and get the parts you want.
String s = "tro[tro%tro[%trololo";
String[] part = s.split("\\[%");
System.out.println(part[0]); // output : tro[tro%tro

Using your input/output pairs as the spec:
String input; // the starting string
String output = input.replaceAll("\\[%.*", "");

Pattern/Matcher group() to obtain substring in Java?

UPDATE: Thanks for all the great responses! I tried many different regex patterns but didn't understand why m.matches() was not doing what I think it should be doing. When I switched to m.find() instead, as well as adjusting the regex pattern, I was able to get somewhere.
I'd like to match a pattern in a Java string and then extract the portion matched using a regex (like Perl's $& operator).
This is my source string "s": DTSTART;TZID=America/Mexico_City:20121125T153000
I want to extract the portion "America/Mexico_City".
I thought I could use Pattern and Matcher and then extract using m.group() but it's not working as I expected. I've tried monkeying with different regex strings and the only thing that seems to hit on m.matches() is ".*TZID.*" which is pointless as it just returns the whole string. Could someone enlighten me?
Pattern p = Pattern.compile ("TZID*:"); // <- change to "TZID=([^:]*):"
Matcher m = p.matcher (s);
if (m.matches ()) // <- change to m.find()
Log.d (TAG, "looking at " + m.group ()); // <- change to m.group(1)

You use m.match() that tries to match the whole string, if you will use m.find(), it will search for the match inside, also I improved a bit your regexp to exclude TZID prefix using zero-width look behind:
Pattern p = Pattern.compile("(?<=TZID=)[^:]+"); //
Matcher m = p.matcher ("DTSTART;TZID=America/Mexico_City:20121125T153000");
if (m.find()) {
System.out.println(m.group());
}

This should work nicely:
Pattern p = Pattern.compile("TZID=(.*?):");
Matcher m = p.matcher(s);
if (m.find()) {
String zone = m.group(1); // group count is 1-based
. . .
}
An alternative regex is "TZID=([^:]*)". I'm not sure which is faster.

You are using the wrong pattern, try this:
Pattern p = Pattern.compile(".*?TZID=([^:]+):.*");
Matcher m = p.matcher (s);
if (m.matches ())
Log.d (TAG, "looking at " + m.group(1));
.*? will match anything in the beginning up to TZID=, then TZID= will match and a group will begin and match everything up to :, the group will close here and then : will match and .* will match the rest of the String, now you can get what you need in group(1)

You are missing a dot before the asterisk. Your expression will match any number of uppercase Ds.
Pattern p = Pattern.compile ("TZID[^:]*:");
You should also add a capturing group unless you want to capture everything, including the "TZID" and the ":"
Pattern p = Pattern.compile ("TZID=([^:]*):");
Finally, you should use the right API to search the string, rather than attempting to match the string in its entirety.
Pattern p = Pattern.compile("TZID=([^:]*):");
Matcher m = p.matcher("DTSTART;TZID=America/Mexico_City:20121125T153000");
if (m.find()) {
System.out.println(m.group(1));
}
This prints
America/Mexico_City

Why not simply use split as:
String origStr = "DTSTART;TZID=America/Mexico_City:20121125T153000";
String str = origStr.split(":")[0].split("=")[1];

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Matching columns containing aggregates with regex - java

Related

Extracting a group from matched String in Java using regex

Regex Fetch value from string

Regex matcher - No match found

Java regex except combination of symbols

Pattern/Matcher group() to obtain substring in Java?

Categories

Resources