Splitting and Parsing formula String - java

I have below formula
(Trig01:BAO)/(((Trig01:COUNT*86400)-Trig01:UPI-Trig01:SOS)*2000)
I want to split and get output of staring values which are before colon only,
Final output need as -
{ "BAO","COUNT","UPI","SOS" }
Thanks in advance,

You can try with Positive Lookbehind in below regex pattern to get all the alphanumeric character after colon
(?<=:)[^\W]+
Online demo
Pattern explanation:
(?<= look behind to see if there is:
: ':'
) end of look-behind
[^\W]+ any character except: non-word characters
(all but a-z, A-Z, 0-9, _) (1 or more times)
Sample code:
String str="(Trig01:BAO)/(((Trig01:COUNT*86400)-Trig01:UPI-Trig01:SOS)*2000)";
Pattern p=Pattern.compile("(?<=:)[^\\W]+");
Matcher m=p.matcher(str);
while(m.find()){
System.out.println(m.group());
}

Use Regex, try this:
public static List<String> extractSubstringsFromAllMatches(String sourceString, String pattern) {
Pattern regexPattern = Pattern.compile(pattern);
Matcher matcher = regexPattern.matcher(sourceString);
List<String> matches = new ArrayList<String>();
while (matcher.find()) {
matches.add(matcher.group(1));
}
return matches;
}
Get the results you require by calling:
extractSubstringsFromAllMatches(YourString,":(\\w*)\\W")

Try this one-line solution:
String[] arr = str.replaceAll("^.*?(?=\\w+:)|:[^:]*$", "").split(":.*?(?=\\w+(:|$))");
This works by first stripping off the leading and trailing non-target chars, then splitting on the intervening chars. Matching is done using look aheads, which assert, but font capture, that a word followed by a colon follows.
Here's some test code:
String str = "(Trig01:BAO)/(((Trig02:COUNT*86400)-Trig03:UPI-Trig04:SOS)*2000)";
String[] arr = str.replaceAll("^.*?(?=\\w+:)|:[^:]*$", "").split(":.*?(?=\\w+(:|$))");
System.out.println(Arrays.toString(arr));
Output:
[Trig01, Trig02, Trig03, Trig04]

Related

Regex for extracting digits in version format

I am going to extract numbers from a string. Numbers represents a version.
It means, I am going to match numbers which are between:
_ and /
/ and /
I have prepared the following regex, but it doesn't work as expected:
.*[\/_](\d{1,2}[.]\d{1,2}[.]\d{1,2})\/.*
For the following example, the regex should match twice:
Input: name_1.1.1/9.10.0/abc. Expected result: 1.1.1 and 9.10.0
, but my regex returns only 9.10.0, 1.1.1 is omitted. Do you have any idea what is wrong?
You could just split the string on _ or /, and then retain components which appear to be versions:
List<String> versions = new ArrayList<>();
String input = "name_1.1.1/9.10.0/abc";
String[] parts = input.split("[_/]");
for (String part : parts) {
if (part.matches("\\d+(?:\\.\\d+)*")) {
versions.add(part);
}
}
System.out.println(versions); // [1.1.1, 9.10.0]
You can assert the / at the end instead of matching it, and omit the .*
Note that you don't have to escape the /
[/_](\d{1,2}[.]\d{1,2}[.]\d{1,2})(?=/)
Regex demo | Java demo
Example code
String regex = "[/_](\\d{1,2}[.]\\d{1,2}[.]\\d{1,2})(?=/)";
String string = "name_1.1.1/9.10.0/abc";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Output
1.1.1
9.10.0
Another option could be using a positive lookbehind to assert either a / or _ to the left, and get a match only.
(?<=[/_])\d{1,2}[.]\d{1,2}[.]\d{1,2}(?=/)
regex demo
Code Demo
String regex = "(\\d+.\\d+.\\d+)";
String string = "name_1.1.1/9.10.0/abc";
String string2 = "randomversion4.5.6/09.7.8_9.88.9";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
Matcher matcher2 = pattern.matcher(string2);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
while (matcher2.find()) {
System.out.println(matcher2.group(1));
}
Out:
1.1.1
9.10.0
4.5.6
09.7.8
9.88.9
Just write regex for what you want to match. In this case just the version number.
Regex can be used to match whole strings or to find if there is a substring that exists in a string.
When using regex to find a substring, you cannot always match all filenames or any string. Hence only match on what you want to find.
This way you can find the versions no matter what string it is in.

Java Regex to extract substring with optional trailing slash

Regex:
\/test\/(.*|\/?)
Input
/something/test/{abc}/listed
/something/test/{abc}
Expected
{abc} for both the inputs
You need to capture all characters other than / after /test/:
String s = "/something/test/{abc}/listed";
Pattern pattern = Pattern.compile("/test/([^/]+)"); // or "/test/\\{([^/}]+)"
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
System.out.println(matcher.group(1));
}
See the online demo
Details:
/test/ - matches /test/
([^/]+) - matches and captures into Group 1 one or more (+) (but as many as possible, since + is greedy) characters other than / (due to the negated character class [^/]).
Note that in Java regex patterns you do not need to escape / since it is not a special character and one needs no regex delimiters.
This should work for you :
public static void main(String[] args) {
String s1 = "/something/test/{abc}/listed";
String s2 = "/something/test/{abc}";
System.out.println(s1.replaceAll("[^{]+(\\{\\w+\\}).*", "$1"));
System.out.println(s2.replaceAll("[^{]+(\\{\\w+\\}).*", "$1"));
}
O/P :
{abc}
{abc}
Regex (as Java string, that is with doubled backslashes):
".*\\/test\\/([^/]*).*"

What is wrong in regexp in Java

I want to get the word text2, but it returns null. Could you please correct it ?
String str = "Text SETVAR((&&text1 '&&text2'))";
Pattern patter1 = Pattern.compile("SETVAR\\w+&&(\\w+)'\\)\\)");
Matcher matcher = patter1.matcher(str);
String result = null;
if (matcher.find()) {
result = matcher.group(1);
}
System.out.println(result);
One way to do it is to match all possible pattern in parentheses:
String str = "Text SETVAR((&&text1 '&&text2'))";
Pattern patter1 = Pattern.compile("SETVAR[(]{2}&&\\w+\\s*'&&(\\w+)'[)]{2}");
Matcher matcher = patter1.matcher(str);
String result = "";
if (matcher.find()) {
result = matcher.group(1);
}
System.out.println(result);
See IDEONE demo
You can also use [^()]* inside the parentheses to just get to the value inside single apostrophes:
Pattern patter1 = Pattern.compile("SETVAR[(]{2}[^()]*'&&(\\w+)'[)]{2}");
^^^^^^
See another demo
Let me break down the regex for you:
SETVAR - match SETVAR literally, then...
[(]{2} - match 2 ( literally, then...
[^()]* - match 0 or more characters other than ( or ) up to...
'&& - match a single apostrophe and two & symbols, then...
(\\w+) - match and capture into Group 1 one or more word characters
'[)]{2} - match a single apostrophe and then 2 ) symbols literally.
Your regex doesn't match your string, because you didn't specify the opened parenthesis also \\w+ will match any combinations of word character and it won't match space and &.
Instead you can use a negated character class [^']+ which will match any combinations of characters with length 1 or more except one quotation :
String str = "Text SETVAR((&&text1 '&&text2'))";
"SETVAR\\(\\([^']+'&&(\\w+)'\\)\\)"
Debuggex Demo

Extracting both matching and not matching regex

I have a String like this one abc3a de'f gHi?jk I want to split it into the substrings abc3a, de'f, gHi, ? and jk. In other terms, I want to return Strings that match the regular expression [a-zA-Z0-9'] and the Strings that do not match this regular expression. If there is a way to tell whether each resulting substring is a match or not, this will be a plus.
Thanks!
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class HelloWorld{
public static void main(String []args){
Pattern pattern = Pattern.compile("([a-zA-Z0-9']*)?([^a-zA-Z0-9']*)?");
String str = "abc3a de'f gHi?jk";
Matcher matcher = pattern.matcher(str);
while(matcher.find()){
if(matcher.group(1).length() > 0)
System.out.println("Match:" + matcher.group(1));
if(matcher.group(2).length() > 0)
System.out.println("Miss: `" + matcher.group(2) + "`");
}
}
}
Output:
Match:abc3a
Miss: ` `
Match:de'f
Miss: ` `
Match:gHi
Miss: `?`
Match:jk
If you don't want white space.
Pattern pattern = Pattern.compile("([a-zA-Z0-9']*)?([^a-zA-Z0-9'\\s]*)?");
Output:
Match:abc3a
Match:de'f
Match:gHi
Miss: `?`
Match:jk
You can use this regex:
"[a-zA-Z0-9']+|[^a-zA-Z0-9' ]+"
Will give:
["abc3a", "de'f", "gHi", "?", "jk"]
Online Demo: http://regex101.com/r/xS0qG4
Java code:
Pattern p = Pattern.compile("[a-zA-Z0-9']+|[^a-zA-Z0-9' ]+");
Matcher m = p.matcher("abc3a de'f gHi?jk");
while (m.find())
System.out.println(m.group());
OUTPUT
abc3a
de'f
gHi
?
jk
myString.split("\\s+|(?<=[a-zA-Z0-9'])(?=[^a-zA-Z0-9'\\s])|(?<=[^a-zA-Z0-9'\\s])(?=[a-zA-Z0-9'])")
splits at all the boundaries between runs of characters in that charset.
The lookbehind (?<=...) matches after a character in a run, while the lookahead (?=...) matches before a character in a run of characters outside the set.
The \\s+ is not a boundary match, and matches a run of whitespace characters. This has the effect of removing white-space from the result entirely.
The | allows causing splitting to happy at either boundary or at a run of white-space.
Since the lookbehind and lookahead are both positive, the boundaries will not match at the start or end of the string, so there's no need to ignore empty strings in the output unless there is white-space there.
You can use anchors to split
private static String[] splitString(final String s) {
final String [] arr = s.split("(?=[^a-zA-Z0-9'])|(?<=[^a-zA-Z0-9'])");
final ArrayList<String> strings = new ArrayList<String>(arr.length);
for (final String str : arr) {
if(!"".equals(str.trim())) {
strings.add(str);
}
}
return strings.toArray(new String[strings.size()]);
}
(?=xxx) means xxx will follow here and (?<=xxx) mean xxx precedes this position.
As you did not want to include all-whitespace-matches into the result you need to filter the Array given by split.

Why the string does not split?

While trying to split a string xyz213123kop234430099kpf4532 into tokens :
xyz213123
kop234430099
kpf4532
I wrote the following code
String s = "xyz213123kop234430099kpf4532";
String regex = "/^[a-zA-z]+[0-9]+$/";
String tokens[] = s.split(regex);
for(String t : tokens) {
System.out.println(t);
}
but instead of tokens, I get the whole string as one output. What is wrong with the regular expression I used ?
You can do that:
String s = "xyz213123kop234430099kpf4532";
String[] result = s.split("(?<=[0-9])(?=[a-z])");
The idea is to use zero width assertions to find the place where to cut the string, then I use a lookbehind (preceded by a digit [0-9]) and a lookahead (followed by a letter [a-z]).
These lookarounds are just checks and match nothing, thus the delimiter of the split is an empty string and no characters are removed from the result.
You could split on this matching between a number and not-a-number.
String s = "xyz213123kop234430099kpf4532";
String[] parts = s.split("(?<![^\\d])(?=\\D)");
for (String p : parts) {
System.out.println(p);
}
Output
xyz213123
kop234430099
kpf4532
There's nothing in your string that matches the regular expression, because your expression starts with ^ (beginning of string) and ends with $ (end of string). So it would either match the whole string, or nothing at all. But because it doesn't match the string, it is not found when you split the string into tokens. That's why you get just one big token.
You don't want to use split for that. The argument to split is the delimiter between tokens. You don't have that. Instead, you have a pattern that repeats and you want each match to the pattern. Try this instead:
String s = "xyz213123kop234430099kpf4532";
Pattern p = Pattern.compile("([a-zA-z]+[0-9]+)");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group());
}
Output:
xyz213123
kop234430099
kpf4532
(I don't know by what logic you would have the second token be "3kop234430099" as in your posted question. I assume that the leading "3" is a typo.)

Categories

Resources