Regex for extracting digits in version format - java

I am going to extract numbers from a string. Numbers represents a version.
It means, I am going to match numbers which are between:
_ and /
/ and /
I have prepared the following regex, but it doesn't work as expected:
.*[\/_](\d{1,2}[.]\d{1,2}[.]\d{1,2})\/.*
For the following example, the regex should match twice:
Input: name_1.1.1/9.10.0/abc. Expected result: 1.1.1 and 9.10.0
, but my regex returns only 9.10.0, 1.1.1 is omitted. Do you have any idea what is wrong?

You could just split the string on _ or /, and then retain components which appear to be versions:
List<String> versions = new ArrayList<>();
String input = "name_1.1.1/9.10.0/abc";
String[] parts = input.split("[_/]");
for (String part : parts) {
if (part.matches("\\d+(?:\\.\\d+)*")) {
versions.add(part);
}
}
System.out.println(versions); // [1.1.1, 9.10.0]

You can assert the / at the end instead of matching it, and omit the .*
Note that you don't have to escape the /
[/_](\d{1,2}[.]\d{1,2}[.]\d{1,2})(?=/)
Regex demo | Java demo
Example code
String regex = "[/_](\\d{1,2}[.]\\d{1,2}[.]\\d{1,2})(?=/)";
String string = "name_1.1.1/9.10.0/abc";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Output
1.1.1
9.10.0
Another option could be using a positive lookbehind to assert either a / or _ to the left, and get a match only.
(?<=[/_])\d{1,2}[.]\d{1,2}[.]\d{1,2}(?=/)
regex demo

Code Demo
String regex = "(\\d+.\\d+.\\d+)";
String string = "name_1.1.1/9.10.0/abc";
String string2 = "randomversion4.5.6/09.7.8_9.88.9";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
Matcher matcher2 = pattern.matcher(string2);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
while (matcher2.find()) {
System.out.println(matcher2.group(1));
}
Out:
1.1.1
9.10.0
4.5.6
09.7.8
9.88.9
Just write regex for what you want to match. In this case just the version number.
Regex can be used to match whole strings or to find if there is a substring that exists in a string.
When using regex to find a substring, you cannot always match all filenames or any string. Hence only match on what you want to find.
This way you can find the versions no matter what string it is in.

Related

Parse string using Java Regex Pattern?

I have the below java string in the below format.
String s = "City: [name:NYK][distance:1100] [name:CLT][distance:2300] [name:KTY][distance:3540] Price:"
Using the java.util.regex package matter and pattern classes I have to get the output string int the following format:
Output: [NYK:1100][CLT:2300][KTY:3540]
Can you suggest a RegEx pattern which can help me get the above output format?
You can use this regex \[name:([A-Z]+)\]\[distance:(\d+)\] with Pattern like this :
String regex = "\\[name:([A-Z]+)\\]\\[distance:(\\d+)\\]";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
StringBuilder result = new StringBuilder();
while (matcher.find()) {
result.append("[");
result.append(matcher.group(1));
result.append(":");
result.append(matcher.group(2));
result.append("]");
}
System.out.println(result.toString());
Output
[NYK:1100][CLT:2300][KTY:3540]
regex demo
\[name:([A-Z]+)\]\[distance:(\d+)\] mean get two groups one the upper letters after the \[name:([A-Z]+)\] the second get the number after \[distance:(\d+)\]
Another solution from #tradeJmark you can use this regex :
String regex = "\\[name:(?<name>[A-Z]+)\\]\\[distance:(?<distance>\\d+)\\]";
So you can easily get the results of each group by the name of group instead of the index like this :
while (matcher.find()) {
result.append("[");
result.append(matcher.group("name"));
//----------------------------^^
result.append(":");
result.append(matcher.group("distance"));
//------------------------------^^
result.append("]");
}
If the format of the string is fixed, and you always have just 3 [...] groups inside to deal with, you may define a block that matches [name:...] and captures the 2 parts into separate groups and use a quite simple code with .replaceAll:
String s = "City: [name:NYK][distance:1100] [name:CLT][distance:2300] [name:KTY][distance:3540] Price:";
String matchingBlock = "\\s*\\[name:([A-Z]+)]\\[distance:(\\d+)]";
String res = s.replaceAll(String.format(".*%1$s%1$s%1$s.*", matchingBlock),
"[$1:$2][$3:$4][$5:$6]");
System.out.println(res); // [NYK:1100][CLT:2300][KTY:3540]
See the Java demo and a regex demo.
The block pattern matches:
\\s* - 0+ whitespaces
\\[name: - a literal [name: substring
([A-Z]+) - Group n capturing 1 or more uppercase ASCII chars (\\w+ can also be used)
]\\[distance: - a literal ][distance: substring
(\\d+) - Group m capturing 1 or more digits
] - a ] symbol.
In the .*%1$s%1$s%1$s.* pattern, the groups will have 1 to 6 IDs (referred to with $1 - $6 backreferences from the replacement pattern) and the leading and final .* will remove start and end of the string (add (?s) at the start of the pattern if the string can contain line breaks).

How to check if specific pattern precedes some character?

I am new into java regex and I could't find an answer.
This is my regex: -?\\d*\\.?\\d+(?!i)
and I want it not to recognize eg. String 551i
This is my method:
private static double regexMatcher(String s, String regex) {
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s.replaceAll("\\s+", ""));
if (!matcher.find()) {
return 0;
}
String found = matcher.group();
return Double.parseDouble(matcher.group());
}
I want this method to return 0.0 but it keeps returning 55.0.
What am I doing wrong?
Use an atomic group to avoid backtracking into the whole digit dot digit matching pattern:
"-?(?>\\d*\\.?\\d+)(?!i)"
See the Java demo and a regex demo.

Java Regex to extract substring with optional trailing slash

Regex:
\/test\/(.*|\/?)
Input
/something/test/{abc}/listed
/something/test/{abc}
Expected
{abc} for both the inputs
You need to capture all characters other than / after /test/:
String s = "/something/test/{abc}/listed";
Pattern pattern = Pattern.compile("/test/([^/]+)"); // or "/test/\\{([^/}]+)"
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
System.out.println(matcher.group(1));
}
See the online demo
Details:
/test/ - matches /test/
([^/]+) - matches and captures into Group 1 one or more (+) (but as many as possible, since + is greedy) characters other than / (due to the negated character class [^/]).
Note that in Java regex patterns you do not need to escape / since it is not a special character and one needs no regex delimiters.
This should work for you :
public static void main(String[] args) {
String s1 = "/something/test/{abc}/listed";
String s2 = "/something/test/{abc}";
System.out.println(s1.replaceAll("[^{]+(\\{\\w+\\}).*", "$1"));
System.out.println(s2.replaceAll("[^{]+(\\{\\w+\\}).*", "$1"));
}
O/P :
{abc}
{abc}
Regex (as Java string, that is with doubled backslashes):
".*\\/test\\/([^/]*).*"

What is wrong in regexp in Java

I want to get the word text2, but it returns null. Could you please correct it ?
String str = "Text SETVAR((&&text1 '&&text2'))";
Pattern patter1 = Pattern.compile("SETVAR\\w+&&(\\w+)'\\)\\)");
Matcher matcher = patter1.matcher(str);
String result = null;
if (matcher.find()) {
result = matcher.group(1);
}
System.out.println(result);
One way to do it is to match all possible pattern in parentheses:
String str = "Text SETVAR((&&text1 '&&text2'))";
Pattern patter1 = Pattern.compile("SETVAR[(]{2}&&\\w+\\s*'&&(\\w+)'[)]{2}");
Matcher matcher = patter1.matcher(str);
String result = "";
if (matcher.find()) {
result = matcher.group(1);
}
System.out.println(result);
See IDEONE demo
You can also use [^()]* inside the parentheses to just get to the value inside single apostrophes:
Pattern patter1 = Pattern.compile("SETVAR[(]{2}[^()]*'&&(\\w+)'[)]{2}");
^^^^^^
See another demo
Let me break down the regex for you:
SETVAR - match SETVAR literally, then...
[(]{2} - match 2 ( literally, then...
[^()]* - match 0 or more characters other than ( or ) up to...
'&& - match a single apostrophe and two & symbols, then...
(\\w+) - match and capture into Group 1 one or more word characters
'[)]{2} - match a single apostrophe and then 2 ) symbols literally.
Your regex doesn't match your string, because you didn't specify the opened parenthesis also \\w+ will match any combinations of word character and it won't match space and &.
Instead you can use a negated character class [^']+ which will match any combinations of characters with length 1 or more except one quotation :
String str = "Text SETVAR((&&text1 '&&text2'))";
"SETVAR\\(\\([^']+'&&(\\w+)'\\)\\)"
Debuggex Demo

Splitting and Parsing formula String

I have below formula
(Trig01:BAO)/(((Trig01:COUNT*86400)-Trig01:UPI-Trig01:SOS)*2000)
I want to split and get output of staring values which are before colon only,
Final output need as -
{ "BAO","COUNT","UPI","SOS" }
Thanks in advance,
You can try with Positive Lookbehind in below regex pattern to get all the alphanumeric character after colon
(?<=:)[^\W]+
Online demo
Pattern explanation:
(?<= look behind to see if there is:
: ':'
) end of look-behind
[^\W]+ any character except: non-word characters
(all but a-z, A-Z, 0-9, _) (1 or more times)
Sample code:
String str="(Trig01:BAO)/(((Trig01:COUNT*86400)-Trig01:UPI-Trig01:SOS)*2000)";
Pattern p=Pattern.compile("(?<=:)[^\\W]+");
Matcher m=p.matcher(str);
while(m.find()){
System.out.println(m.group());
}
Use Regex, try this:
public static List<String> extractSubstringsFromAllMatches(String sourceString, String pattern) {
Pattern regexPattern = Pattern.compile(pattern);
Matcher matcher = regexPattern.matcher(sourceString);
List<String> matches = new ArrayList<String>();
while (matcher.find()) {
matches.add(matcher.group(1));
}
return matches;
}
Get the results you require by calling:
extractSubstringsFromAllMatches(YourString,":(\\w*)\\W")
Try this one-line solution:
String[] arr = str.replaceAll("^.*?(?=\\w+:)|:[^:]*$", "").split(":.*?(?=\\w+(:|$))");
This works by first stripping off the leading and trailing non-target chars, then splitting on the intervening chars. Matching is done using look aheads, which assert, but font capture, that a word followed by a colon follows.
Here's some test code:
String str = "(Trig01:BAO)/(((Trig02:COUNT*86400)-Trig03:UPI-Trig04:SOS)*2000)";
String[] arr = str.replaceAll("^.*?(?=\\w+:)|:[^:]*$", "").split(":.*?(?=\\w+(:|$))");
System.out.println(Arrays.toString(arr));
Output:
[Trig01, Trig02, Trig03, Trig04]

Categories

Resources