Extract values from string using regex groups - java

I have to extract values from string using regex groups.
Inputs are like this,
-> 1
-> 5.2
-> 1(2)
-> 3(*)
-> 2(3).2
-> 1(*).5
Now I write following code for getting values from these inputs.
String stringToSearch = "2(3).2";
Pattern p = Pattern.compile("(\\d+)(\\.|\\()(\\d+|\\*)\\)(\\.)(\\d+)");
Matcher m = p.matcher(stringToSearch);
System.out.println("1: "+m.group(1)); // O/P: 2
System.out.println("3: "+m.group(3)); // O/P: 3
System.out.println("3: "+m.group(5)); // O/P: 2
But, my problem is only first group is compulsory and others are optional.
Thats why I need regex like, It will check all patterns and extract values.

Use non-capturing groups and turn them to optional by adding ? quantifier next to those groups.
^(\d+)(?:\((\d+|\*)\))?(?:\.(\d+))?$
DEMO
Java regex would be,
"(?m)^(\\d+)(?:\\((\d\+|\\*)\\))?(?:\\.(\\d+))?$"
Example:
String input = "1\n" +
"5.2\n" +
"1(2)\n" +
"3(*)\n" +
"2(3).2\n" +
"1(*).5";
Matcher m = Pattern.compile("(?m)^(\\d+)(?:\\((\\d+|\\*)\\))?(?:\\.(\\d+))?$").matcher(input);
while(m.find())
{
if (m.group(1) != null)
System.out.println(m.group(1));
if (m.group(2) != null)
System.out.println(m.group(2));
if (m.group(3) != null)
System.out.println(m.group(3));
}

Here is an alternate approach that is simpler to understand.
First replace all non-digit, non-* characters by a colon
Split by :
Code:
String repl = input.replaceAll("[^\\d*]+", ":");
String[] tok = repl.split(":");
RegEx Demo

Related

Using Regular Expression in Java to extract information from a String

I have one input String like this:
"I am Duc/N Ta/N Van/N"
String "/N" present it is the Name of one person.
The expected output is:
Name: Duc Ta Van
How can I do it by using regular expression?
You can use Pattern and Matcher like this :
String input = "I am Duc/N Ta/N Van/N";
Pattern pattern = Pattern.compile("([^\\s]+)/N");
Matcher matcher = pattern.matcher(input);
String result = "";
while (matcher.find()) {
result+= matcher.group(1) + " ";
}
System.out.println("Name: " + result.trim());
Output
Name: Duc Ta Van
Another Solution using Java 9+
From Java9+ you can use Matcher::results like this :
String input = "I am Duc/N Ta/N Van/N";
String regex = "([^\\s]+)/N";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
String result = matcher.results().map(s -> s.group(1)).collect(Collectors.joining(" "));
System.out.println("Name: " + result); // Name: Duc Ta Van
Here is the regex to use to capture every "name" preceded by a /N
(\w+)\/N
Validate with Regex101
Now, you just need to loop on every match in that String and concatenate the to get the result :
String pattern = "(\\w+)\\/N";
String test = "I am Duc/N Ta/N Van/N";
Matcher m = Pattern.compile(pattern).matcher(test);
StringBuilder sbNames = new StringBuilder();
while(m.find()){
sbNames.append(m.group(1)).append(" ");
}
System.out.println(sbNames.toString());
Duc Ta Van
It is giving you the hardest part. I let you adapt this to match your need.
Note :
In java, it is not required to escape a forward slash, but to use the same regex in the entire answer, I will keep "(\\w+)\\/N", but "(\\w+)/N" will work as well.
I've used "[/N]+" as the regular expression.
Regex101
[] = Matches characters inside the set
\/ = Matches the character / literally (case sensitive)
+ = Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)

Regex in Java not working while same regex is working in shell

I want to replace all :variable (word starting with :) with ${variable}$.
For example,
:aks_num with ${aks_num}$
:brn_num with ${brn_num}$
Following is my code, which does not work:
public static void main(String[] argv) throws Exception
{
CharSequence chSeq = "AND ((:aks_num = -1) OR (aks_num = :aks_num AND ((:brn_num = -1) OR (brn_num = :brn_num))))";
// replaceAll also not working
//String s = chSeq.replaceAll(":\\([a-z_]*\\)","\\${ $1 \\}$");
Pattern p = Pattern.compile(":\\([a-z_]*\\)");
Matcher m = p.matcher(chSeq);
if (m.find()) {
System.out.println("Found value: " + m.group(0) );
System.out.println("Found value: " + m.group(1) );
System.out.println("Found value: " + m.group(2) );
} else {
System.out.println("NO MATCH");
}
}
While in shell script the following regex works perfectly:
s/:\([a-z_]*\)/${\1}$/g
:\\([a-z_]*\\) (with escaped parenthesis) means that you want to match expressions like :(aks_num). Obviously, there are no such expression in the input string. That explains why there are no matches.
Instead, if you want to use parenthesis in order to capture some variables, you should not escape the parenthesis.
Example :
CharSequence chSeq = "AND ((:aks_num = -1) OR (aks_num = :aks_num AND ((:brn_num = -1) OR (brn_num = :brn_num))))";
Pattern p = Pattern.compile(":([a-z_]*)");
Matcher m = p.matcher(chSeq);
while (m.find()) {
System.out.println("Found value: " + m.group(0)+". Captured : "+m.group(1));
}
Output:
Found value: :aks_num. Captured : aks_num
Found value: :aks_num. Captured : aks_num
Found value: :brn_num. Captured : brn_num
Found value: :brn_num. Captured : brn_num
CharSequence chSeq = "AND ((:aks_num = -1) OR (aks_num = :aks_num AND ((:brn_num = -1) OR (brn_num = :brn_num))))";
// replaceAll also not working
//String s = chSeq.replaceAll(":\\([a-z_]*\\)","\\${ $1 \\}$");
Pattern p = Pattern.compile(":(\\w+)");
Matcher m = p.matcher(chSeq);
while (m.find()) {
System.out.println("Found value: " + m.group(1) );
}
Ideone Demo
Working fine with replaceAll
Pattern p = Pattern.compile("(:\\w+)");
Matcher m = p.matcher(x);
x = m.replaceAll("\\${$1}\\$");
You don't need to escape the parentheses, so
Pattern.compile(":([a-z_]*)");
should work.
I believe you got confused with the Java's regex syntax that is different from regular sed syntax. You do not need to escape parentheses to make them "special" grouping operators. Vice versa, in Java, when you escape parentheses, they start matching literal ( and ) symbols.
In the replacement pattern, $ must be escaped for the regex engine to replace with literal $ symbols, but you do not need to escape braces there.
So, just use
.replaceAll(":([a-z_]+)", "\\${$1}\\$")
See the IDEONE demo
I suggest the + quantifier because I doubt you need to match a : followed with a space, or digits - any non-letter.
BTW, you do not need any /g flag in Java since replaceAll will replace all matches with the provided replacement pattern.
NOTE: you can further adjust the pattern to match all letters/digits/underscores with ":(\\w+)". Or just alphanumerics/underscore: ":([\\p{Alnum}_]+)".

Java : Splitting a String using Regex

I have to split a string using comma(,) as a separator and ignore any comma that is inside quotes(")
fieldSeparator : ,
fieldGrouper : "
The string to split is : "1","2",3,"4,5"
I am able to achieve it as follows :
String record = "\"1\",\"2\",3,\"4,5\"";
String[] tokens = record.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
Output :
"1"
"2"
3
"4,5"
Now the challenge is that the fieldGrouper(") should not be a part of the split tokens. I am unable to figure out the regex for this.
The expected output of the split is :
1
2
3
4,5
Update:
String[] tokens = record.split( "(,*\",*\"*)" );
Result:
Initial Solution:
( doesn't work # .split method )
This RexEx pattern will isolate the sections you want:
(?:\\")(.*?)(?:\\")
It uses non-capturing groups to isolate the pairs of escaped quotes,
and a capturing group to isolate everything in between.
Check it out here:
Live Demo
My suggestion:
"([^"]+)"|(?<=,|^)([^,]*)
See the regex demo. It will match "..." like strings and capture into Group 1 only what is in-between the quotes, and then will match and capture into Group 2 sequences of characters other than , at the start of a string or after a comma.
Here is a Java sample code:
String s = "value1,\"1\",\"2\",3,\"4,5\",value2";
Pattern pattern = Pattern.compile("\"([^\"]+)\"|(?<=,|^)([^,]*)");
Matcher matcher = pattern.matcher(s);
List<String> res = new ArrayList<String>();
while (matcher.find()){ // Run the matcher
if (matcher.group(1) != null) { // If Group 1 matched
res.add(matcher.group(1)); // Add it to the resulting array
} else {
res.add(matcher.group(2)); // Add Group 2 as it got matched
}
}
System.out.println(res); // => [value1, 1, 2, 3, 4,5, value2]
I would try with this kind of workaround:
String record = "\"1\",\"2\",3,\"4,5\"";
record = record.replaceAll("\"?(?<!\"\\w{1,9999}),\"?|\""," ");
String[] tokens = record.trim().split(" ");
for(String str : tokens){
System.out.println(str);
}
Output:
1
2
3
4,5
My proposition:
record = record.replaceAll("\",", "|");
record = record.replaceAll(",\\\"", "|");
record = record.replaceAll("\"", "");
String[] tokens = record.split("\\|");
for (String token : tokens) {
System.out.println(token);
}

Find multiple string matches using Java regex

I am trying to use regex to find a match for a string between Si and (P) or Si and (I).
Below is what I wrote. Why isn't it working and how do I fix it?
String Channel = "Si0/4(I) Si0/6( Si0/8K Si0/5(P)";
if (Channel.length() > 0) {
String pattern1 = "Si";
String pattern2 = "(P)";
String pattern3 = "(I)";
String P1 = Pattern.quote(pattern1) + "(.*?)[" + Pattern.quote(pattern2) + "|" + Pattern.quote(pattern3) + "]";
Pattern p = Pattern.compile(P1);
Matcher m = p.matcher(Channel);
while(m.find()){
if (m.group(1)!= null)
{
System.out.println(m.group(1));
}
else if (m.group(2)!= null)
{
System.out.println(m.group(2));
}
}
}
Expected output
0/4
0/5
Actual output
0/4
0/6
0/8K Si0/5
Use a lookbehind and lookahead in your regex. And also you need to add space inside the character class, so that it won't this 0/8K string .
(?<=Si)[^\\( ]*(?=\\((?:P|I)\\))
DEMO
String str="Si0/4(I) Si0/6( Si0/8K Si0/5(P)";
String regex="(?<=Si)[^\\( ]*(?=\\([PI]\\))";
Pattern pattern = Pattern.compile(regex);
Matcher matcher =pattern.matcher(str);
while(matcher.find()){
System.out.println(matcher.group(0));
}
Output:
0/4
0/5
You need to group your regex.It is currently
Si(.*?)[(P)|(I)]
Whereas it should be
Si(.*?)\(I\)|Si(.*?)\(P\)
See demo.
http://regex101.com/r/oO8zI4/8
[] means "any of these character", so it evaluates every letter in the block as if they were separated with OR.
If the result you're searching is always: number/number
You can use:
Si(\d+\/\d+)(?:\(P\)|\(I\))

REGEX : How to escape []?

I'm working on strings like "[ro.multiboot]: [1]". How do I just select 1(it can also be 0) out of this string?
I am looking for a regex in Java.
Usually, you would do something like (assuming 0 and 1 were the only options):
^.*\[([01])\].*$
If you only wanted the value for ro.multiboot, you could change it to something like:
^.*\[ro.multiboot\].*\[([01])\].*$
(depending on how complex any of the non-bracketed stuff is allowed to be).
These would both basically only extract the value between square brackets if it were zero or one, and capture it into a capture variable so you could use it.
Of course, regex is not a world-wide standard, nor are the environments in which you use it. That means it depends a lot on your actual environment how you will actually code this up.
For Java, the following sample program may help:
import java.util.regex.*;
class Test {
public static void main(String args[]) {
Pattern p = Pattern.compile("^.*\\[ro.multiboot\\].*\\[([01])\\].*$");
String str;
Matcher m;
str = "[ro.multiboot]: [0]";
m = p.matcher (str);
if (m.find()) {
System.out.println ("str0 has " + m.group(1));
}
str = "[ro.multiboot]: [1]";
m = p.matcher (str);
if (m.find()) {
System.out.println ("str1 has " + m.group(1));
}
str = "[ro.multiboot]: [2]";
m = p.matcher (str);
if (m.find()) {
System.out.println ("str2 has " + m.group(1));
}
}
}
This results in (as expected):
str0 has 0
str1 has 1
#paxdiablo's regexps are correct, but complete answer for "How do I just select 1(it can also be 0) out of this string?" is:
1. very simple solution
String input = "[ro.multiboot]: [1]";
String matched = input.replaceFirst( "^.*\\[ro.multiboot\\].*\\[([01])\\].*$", "$1" );
2. same functionality, more complicated but with better performance
String input = "[ro.multiboot]: [1]";
Pattern p = Pattern.compile( "^.*\\[ro.multiboot\\].*\\[([01])\\].*$" );
Matcher m = p.matcher( input );
String matched = null;
if ( m.matches() ) matched = m.group( 1 );
Performance is better because the pattern is compiled just once (for example when you are matching array os such Strings);
Notes:
in both examples the group is part of regexps between ( and ) (if not escaped)
in Java you have to use \\[, because \[ returns error - it is not correct escape sequence for String

Categories

Resources