java regex to extract 'only' the single digit numbers from a string - java

let's say I have a string.
String str = "Hello6 9World 2, Nic8e D7ay!";
Matcher match = Pattern.compile("\\d+").matcher(str);
the line above would give me 6, 9, 2, 8 and 7, which is perfect!
But if my string changes to..
String str = "Hello69World 2, Nic8e D7ay!";
note that the space between 6 and 9 is removed in this string.
and if I run..
Matcher match = Pattern.compile("\\d+").matcher(str);
it would give me 69, 2, 8 and 7.
my requirement is to extract the single digit numbers only. here, what I need is 2, 8, 7 and omit 69.
could you please help me to improve my regex? Thank you!

For each digit, you have to check if it is not followed or preceded by a
digit
You can try this :
public static void main(String[] args) {
String str = "Hello69World 2, Nic8e D7ay!";
Pattern p = Pattern.compile("(?<!\\d)\\d(?!\\d)");
Matcher m = p.matcher(str);
while (m.find()) {
System.out.println(m.group());
}
System.out.println("***********");
str = "Hello6 9World 2, Nic8e D7ay!";
m = p.matcher(str);
while (m.find()) {
System.out.println(m.group());
}
}
O/P :
2
8
7
***********
6
9
2
8
7

Related

Regex to extract value from a predefined String formats

I'm new to generating regex. I know we can use Pattern & Matcher classes to compile and find the regex from a String.
But i'm not sure how to create the regex for my problem, which is as below.
Example, if String str = "T2(123)", then my regex should return 123, where T2 is going to be constant always, only value=123 going to change.
Similarly, if String str = "T2(23)K3(11)", then it should return 23+11 = 34, where T2 and K3 are constants.
I'm thinking of making String T2(#) and T2(#)K3(#) as my tokens and comparing my input string with this tokens and returning the value of # or sum(#).
But not sure how to do the same using regex.
int sum = 0;
String type = "T2(23)";
String pttrn = "(?<=T2\\()\\d+(?=\\))";
Pattern p = Pattern.compile(pttrn);
Matcher m = p.matcher(type);
while (m.find()) {
sum += Integer.parseInt(m.group());
}
System.out.println(sum);
I have tried the above code and it returns 23, but it is not working for T2(#)K3(#) type.
Seems you just want the numbers inside parentheses, and ignore the rest, so:
\((\d+)\)
static void test(String input) {
Pattern p = Pattern.compile("\\((\\d+)\\)");
Matcher m = p.matcher(input);
int sum = 0;
while (m.find()) {
sum += Integer.parseInt(m.group(1));
}
System.out.println(sum);
}
public static void main(String[] args) {
test("T2(23)"); // prints: 23
test("T2(123)"); // prints: 123
test("T2(23)K3(11)"); // prints: 34
test("T2(23)K3(11)U4(42)"); // prints: 76
}
Note: The above regex is following the KISS principle ("keep it simple, stupid").
Here's a generalization without using lookarounds:
String input = "T2(23)K3(11)U4(42)";
// matches 1 uppercase alphabetic, 1 digit, and
// back-references any length digit sequence between parenthesis (excluded)
Pattern pattern = Pattern.compile("\\p{Upper}\\d\\((\\d+)\\)");
Matcher matcher = pattern.matcher(input);
int total = 0;
// iterates occurrences and sums
while (matcher.find()) {
// this is safe as group 1 will always be a digit sequence
// not safe from arithmetic overflows though
total += Integer.valueOf(matcher.group(1));
}
Your total would be 76 here.
Note
As posted by Andreas, a looser requirement (only digits between parenthesis) begets a simpler pattern.

separate mathematical expression in java

Would anyone be able to help me with separating this mathematical expression 125*1*4*4+82*1*10+2*59+2+4 in Java. I want to get the numbers form the expression, and I'm not sure how to use split() method over here.
You can use split with this regex [*+] :
String[] numbers = "125*1*4*4+82*1*10+2*59+2+4".split("[*+]");
Outputs
[125, 1, 4, 4, 82, 1, 10, 2, 59, 2, 4]
In case the mathematical expression contains spaces, you can remove them before and use split so you can use :
String[] numbers = "125*1*4*4+82*1*10+2*59+2+4".replaceAll("\\s+", "").split("[*+]");
//---------------------------------------------^----------------------^
Note you can add another arithmetic operators like [*+-/]
Another solution from eparvan:
In case you are not sure what the expression can contain you can use :
String[] numbers = "125*1*4*4+82*1*10+2*59+2+4".replaceAll("\\s+", "").split("[^0-9]");
//----------------------------------------------------------------------------^----^
Edit
What if I want to get the "+" and "*" ?
Input: 125*1*4*4+82*1*10+2*59+2+4 Output: ***+**+*++
In this case you can split with \d+ like this :
String[] numbers = "125*1*4*4+82*1*10+2*59+2+4".replaceAll("\\s+", "").split("\\d+");
But i will prefert to go with Pattern it is more practice then split for example you can use :
String str = "125*1/4*4+82*1*10+2/59-2+4";
String regex = "[^\\d]";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group());
}

Regex to split parameters

I am looking for a regular expression to split a string on commas. Sounds very simple, but there is another restriction. The parameters on the string could have commas surrounded by parenthesis which should not split the string.
Example:
1, 2, 3, add(4, 5, 6), 7, 8
^ ^ ^ ! ! ^ ^
The string should only be splitted by the commas marked with ^ and not with !.
I found a solution for it here: A regex to match a comma that isn't surrounded by quotes
Regex:
,(?=([^\(]*\([^\)]*\))*[^\)]*$)
But my string could be more complex:
1, 2, 3, add(4, 5, add(6, 7, 8), 9), 10, 11
^ ^ ^ ! ! ! ! ! ^ ^
For this string the result is wrong and i have no clue how to fix this or if it even is possible with regular expressions.
Have anyone an idea how to resolve this problem?
Thanks for your help!
Ok, I think a regular expression is not very useful for this. A small block of java might be easier.
So this is my java code for solving the problem:
public static void splitWithJava() {
String EXAMPLE = "1, 2, 3, add(4, 5, add(7, 8), 6), 7, 8";
List<String> list = new ArrayList<>();
int start = 0;
int pCount = 0;
for (int i = 0; i < EXAMPLE.length(); i++) {
char c = EXAMPLE.charAt(i);
switch (c) {
case ',': {
if (0 == pCount) {
list.add(EXAMPLE.substring(start, i).trim());
start = i + 1;
};
break;
}
case '(': {
pCount++;
break;
}
case ')': {
pCount--;
break;
}
}
}
list.add(EXAMPLE.substring(start).trim());
for (String str : list) {
System.out.println(str);
}
}
You can also achieve this using this regex: ([^,(]+(?=,|$)|[\w]+\(.*\)(?=,|$))
regex online demo
Considering this text 1, 2, 3, add(4, 5, add(6, 7, 8), 9), 10, 11 it creates groups based on commas (not surrounded by ())
So, the output would be:
Match 1
Group 1. 0-1 `1`
Match 2
Group 1. 2-4 ` 2`
Match 3
Group 1. 5-7 ` 3`
Match 4
Group 1. 9-35 `add(4, 5, add(6, 7, 8), 9)`
Match 5
Group 1. 36-39 ` 10`
Match 6
Group 1. 40-43 ` 11`

Java: extract the single matching groups from a string with regular expression [duplicate]

This question already has answers here:
How to split a string between letters and digits (or between digits and letters)?
(8 answers)
Closed 8 years ago.
I have this kind of string: 16B66C116B or 222A3*C10B
It's a number (with unknow digits) followed or by a letter ("A") or by a star and a letter ("*A"). This patter is repeated 3 times.
I want to split this string to have: [number,text,number,text,number,text]
[16, B, 66, C, 116, B]
or
[16, B, 66, *C, 116, B]
I wrote this:
String tmp = "16B66C116B";
String tmp2 = "16B66*C116B";
String pattern = "(\\d+)(\\D{1,2})(\\d+)(\\D{1,2})(\\d+)(\\D{1,2})";
boolean q = tmp.matches(pattern);
String a[] = tmp.split(pattern);
the pattern match right, but the splitting doesn't work.
(I'm open to improve my pattern string, I think that it could be write better).
You are misunderstanding the functionality of split. Split will split the string on the occurence of the given regular expression, since your expression matches the whole string it returns an empty array.
What you want is to extract the single matching groups (the stuff in the brackets) from the match. To achieve this you have to use the Pattern and Matcher classes.
Here a code snippet which will print out all matches:
Pattern regex = Pattern.compile("(\\d+)(\\D{1,2})(\\d+)(\\D{1,2})(\\d+)(\\D{1,2})");
Matcher matcher = regex.matcher("16B66C116B");
while (matcher.find()) {
for (int i = 1; i <= matcher.groupCount(); ++i) {
System.out.println(matcher.group(i));
}
}
Of course you can improve the regular expression (like another user suggested)
(\\d+)([A-Z]+)(\\d+)(\\*?[A-Z]+)(\\d+)([A-Z]+)
Try with this pattern (\\d)+|(\\D)+ and use Matcher#find() to find the next subsequence of the input sequence that matches the pattern.
Add all of them in a List or finally convert it into array.
String tmp = "16B66C116B";
String tmp2 = "16B66*C116B";
String pattern = "((\\d)+|(\\D)+)";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(tmp);
while (m.find()) {
System.out.println(m.group());
}

java string split regular expression

I have a string (1, 2, 3, 4), and I want to parse the integers into an array.
I can use split(",\\s") to split all but the beginning and ending elements. My question is how can I modify it so the beginning and ending parenthesis will be ignored?
You'd be better served by matching the numbers instead of matching the space between them. Use
final Matcher m = Pattern.compile("\\d+").matcher("(1, 2, 3, 4)");
while (m.find()) System.out.println(Integer.parseInt(m.group()));
Use 2 regexes: first that removes parenthesis, second that splits:
Pattern p = Pattern.compile("\\((.*)\\)");
Matcher m = p.matcher(str);
if (m.find()) {
String[] elements = m.group(1).split("\\s*,\\s*");
}
And pay attention on my modification of your split regex. It is much more flexible and safer.
You could use substring() and then split(",")
String s = "(1,2,3,4)";
String s1 = s.substring(1, s.length()-2);//index should be 1 to length-2
System.out.println(s1);
String[] ss = s1.split(",");
for(String t : ss){
System.out.println(t);
}
Change it to use split("[^(),\\s]") instead.

Categories

Resources