Get number in string with a regexp - java

I'm trying to get numbers from a String. Numbers only separated by space.
This code works a lot of case except I've got two numbers separated with only one space.
Pattern pattern = Pattern.compile("(^|\\s)[0-9]+(\\s|$)");
Matcher matcher = pattern.matcher(value);
while (matcher.find()) {
numericResquest.add(Integer.parseInt(matcher.group().trim()));
}
For example:
OK: 11 rre 12
OK: 11 12 (two spaces between the numbers)
Can't find 11 test 12 11 rre (only one space between the numbers)
Thank you

Why not just match the digit in your regex with lookarounds:
Pattern pattern = Pattern.compile("(?<=\\s)[0-9]+(?=\\s+|$)");
Matcher matcher = pattern.matcher(value);
while (matcher.find()) {
numericResquest.add(Integer.parseInt(matcher.group()));
}

String[] numbers = value.split(" ");
for(String number : numbers) {
numericRequest.add(Integer.parseInt(number));
}
If you get a NumberFormatException, the input was not formatted correctly.

The problem is that, not counting cases at the beginning or end of the string, your pattern requires a space both before and after the number. So if your string is "11 12", the first match will find "11 ", with a space at the end. The matcher's index will point after the pattern, i.e. to "12". But since the pattern also requires a space at the beginning, the next attempt to match won't work, because there's no space at the beginning of "12".
One way to solve this while using the same approach: Use matcher.lookingAt instead of matcher.find; lookingAt will match only if there's a pattern starting at the current index. Then you can fix your pattern so that there doesn't have to be a space at the beginning.
Pattern pattern = Pattern.compile("\\s*[0-9]+(\\s|$)");
Matcher matcher = pattern.matcher(value);
while (matcher.lookingAt()) {
numericResquest.add(Integer.parseInt(matcher.group().trim()));
}
This allows, but doesn't require, any number of spaces at the current index before the number occurs.
(Note: I haven't tested this.)

Regex
(^|(?<=\s))(\d+)(?=\s|$)
Iterate over all matches and capturing groups in a string
try {
Pattern regex = Pattern.compile("(^|(?<=\\s))(\\d+)(?=\\s|$)", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
for (int i = 1; i <= regexMatcher.groupCount(); i++) {
// matched text: regexMatcher.group(i)
// match start: regexMatcher.start(i)
// match end: regexMatcher.end(i)
}
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}

Related

Regex for extracting digits in version format

I am going to extract numbers from a string. Numbers represents a version.
It means, I am going to match numbers which are between:
_ and /
/ and /
I have prepared the following regex, but it doesn't work as expected:
.*[\/_](\d{1,2}[.]\d{1,2}[.]\d{1,2})\/.*
For the following example, the regex should match twice:
Input: name_1.1.1/9.10.0/abc. Expected result: 1.1.1 and 9.10.0
, but my regex returns only 9.10.0, 1.1.1 is omitted. Do you have any idea what is wrong?
You could just split the string on _ or /, and then retain components which appear to be versions:
List<String> versions = new ArrayList<>();
String input = "name_1.1.1/9.10.0/abc";
String[] parts = input.split("[_/]");
for (String part : parts) {
if (part.matches("\\d+(?:\\.\\d+)*")) {
versions.add(part);
}
}
System.out.println(versions); // [1.1.1, 9.10.0]
You can assert the / at the end instead of matching it, and omit the .*
Note that you don't have to escape the /
[/_](\d{1,2}[.]\d{1,2}[.]\d{1,2})(?=/)
Regex demo | Java demo
Example code
String regex = "[/_](\\d{1,2}[.]\\d{1,2}[.]\\d{1,2})(?=/)";
String string = "name_1.1.1/9.10.0/abc";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Output
1.1.1
9.10.0
Another option could be using a positive lookbehind to assert either a / or _ to the left, and get a match only.
(?<=[/_])\d{1,2}[.]\d{1,2}[.]\d{1,2}(?=/)
regex demo
Code Demo
String regex = "(\\d+.\\d+.\\d+)";
String string = "name_1.1.1/9.10.0/abc";
String string2 = "randomversion4.5.6/09.7.8_9.88.9";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
Matcher matcher2 = pattern.matcher(string2);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
while (matcher2.find()) {
System.out.println(matcher2.group(1));
}
Out:
1.1.1
9.10.0
4.5.6
09.7.8
9.88.9
Just write regex for what you want to match. In this case just the version number.
Regex can be used to match whole strings or to find if there is a substring that exists in a string.
When using regex to find a substring, you cannot always match all filenames or any string. Hence only match on what you want to find.
This way you can find the versions no matter what string it is in.

How to delete everything after the last number in a String in Java?

If I have a String that consists of letters and numbers, how can I get rid of everything after the last number in the String?
Example:
banana_orange_62_34_wednesday would become banana_orange_62_34
1234_4564_www_6_j_1_rrrr would become 1234_4564_www_6_j_1
I tried this so far:
int endIndex = inputXMLFilename.lastIndexOf("\\d+");
inputXMLFilename = inputXMLFilename.substring(0, endIndex);
Use regex replace:
str = str.replaceAll("\\D+$", "");
What the regex means:
\D means “non-digit”
+ means “one or more of the previous term, greedy (as much of the input as possible)”
$ means “end of input”
The $ anchors the match to the end, without which this would match (and delete) all non-digits.
lastIndexOf() only works with plain text, not regex.
#Test
public void cutAfterLastDigit() {
String s = "banana_orange_62_34_wednesday";
Pattern pattern = Pattern.compile("^(.*\\d)");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(0));
}
}

Regex capturing groups within logical OR

I have a set of strings I need to parse and extract values from. They look like:
/apple/1212d3fe
/cat/23224a2f4
/auto/445478eefd
/somethingelse/1234fded
It should match only apple, cat and auto. The output I expect is:
1212, d3fe
23224, a2f4
445478, eefd
null
I need to come up with a regex capturing groups to do the same. I am able to extract the second part but not the first one. The closest I came up with is:
String r2 = "^/(apple/[0-9]{4}|cat/[0-9]{5}|auto/[0-9]{6})([a-f0-9]{4})$";
System.out.println(r2);
Pattern pattern2 = Pattern.compile(r2);
Matcher matcher2 = pattern2.matcher("/apple/2323efff");
if (matcher2.find()) {
System.out.println(matcher2.group(1));
System.out.println(matcher2.group(2));
}
UPDATED QUESTION:
I have a set of strings I need to parse and extract values from. They look like:
/apple/1212d3fe
/cat/23e24a2f4
/auto/df5478eefd
/somethingelse/1234fded
It should match only apple, cat and auto. The output I expect is the everything after the 2nd '/' split as follows: 4 characters if 'apple', 5 characters if 'cat' and 6 characters if 'auto' like:
1212, d3fe
23e24, a2f4
df5478, eefd
null
I need to come up with a regex capturing groups to do the same. I am able to extract the second part but not the first one. The closest I came up with is:
String r2 = "^/(apple/[0-9]{4}|cat/[0-9]{5}|auto/[0-9]{6})([a-f0-9]{4})$";
System.out.println(r2);
Pattern pattern2 = Pattern.compile(r2);
Matcher matcher2 = pattern2.matcher("/apple/2323efff");
if (matcher2.find()) {
System.out.println(matcher2.group(1));
System.out.println(matcher2.group(2));
}
I can do it without the regex OR(|) but it breaks when I include it. Any help with the right regex?
Updated Answer:
As per your updated question you can use this regex based on lookbehind assertions:
/((?<=apple/).{4}|(?<=cat/).{5}|(?<=auto/).{6})(.+)$
RegEx Demo
This regex uses 2 capture groups after matching /
In 1st group we have 3 lookbehind conditions with alternations.
(?<=apple/).{4} makes sure that we match 4 characters that have apple/ on left hand side. Likewise we match 5 and 6 character strings that have cat/ and /auto/.
In 2nd capture group we match remaining characters before end of line.
You could use the regex \/[apple|auto|cat]+\/(\d*)(.*), See here
If you want the last group to have exactly 4 digits you can use this regex:
/(apple|cat|auto)/([0-9a-f]+)([0-9a-f]{4})
Here is a working example:
List<String> strings = Arrays.asList("/apple/1212d3fe", "/cat/23224a2f4", "/auto/445478eefd");
Pattern pattern = Pattern.compile("/(apple|cat|auto)/([0-9a-f]+)([0-9a-f]{4})");
for (String string : strings) {
Matcher matcher = pattern.matcher(string);
if (matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
}
}
If you want for digits after apple, 5 after cat and 6 after auto you can split your algorithm in 2 parts:
List<String> strings = Arrays.asList("/apple/1212d3fe", "/cat/23224a2f4", "/auto/445478eefd", "/some/445478eefd");
Pattern firstPattern = Pattern.compile("/(apple|cat|auto)/([0-9a-f]+)");
for (String string : strings) {
Matcher firstMatcher = firstPattern.matcher(string);
if (firstMatcher.find()) {
String first = firstMatcher.group(1);
System.out.println(first);
int length = getLength(first);
Pattern secondPattern = Pattern.compile("([0-9a-f]{" + length + "})([0-9a-f]{4})");
Matcher secondMatcher = secondPattern.matcher(string);
if (secondMatcher.find()) {
System.out.println(secondMatcher.group(1));
System.out.println(secondMatcher.group(2));
}
}
}
private static int getLength(String key) {
switch (key) {
case "apple":
return 4;
case "cat":
return 5;
case "auto":
return 6;
}
throw new IllegalArgumentException("key not allowed");
}

Finding a Digit using Pattern and Matcher

I am trying to use Pattern and Matcher to determine if a given string has a space between 2 digits. For example "5 1" should come back as true, "51" should come back as false. At first I was using string.replaceAll with the regex and it worked great, but moveing to Pattern I can't seem to get it to work.
String findDigit = "5 1/3";
String regex = "(\\d) +(\\d)";
findDigit = findDigit.replaceAll(regex, "$1 $2");
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(findDigit);
System.out.println(m.matches());
System.out.println(m.hitEnd());
I first started with this. The replaceAll works without a hitch and removes the extra spaces, but the m.matches and the m.hitEnd both return false. Then I thought I might be doing something wrong so I simplified the case to just
String findDigit = "5";
String regex = "\\d";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(findDigit);
System.out.println(m.matches());
System.out.println(m.hitEnd());
and matches comes back true (obviously) but when I change it to this
String findDigit = "5 3";
String regex = "\\d";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(findDigit);
System.out.println(m.matches());
System.out.println(m.hitEnd());
comes back both false. So I guess my main question is how to I determine that there is ANY digit in my string first and then more specifically, how do I deteremine if there is a digit space digit in my string. I thought that was the hitEnd, but I guess I am mistaken. Thanks in advance.
If you're looking for a match with multiple spaces but would like to preserve the formatting of the output you could use groups and back-references.
For instance:
String input = "blah 5 6/7";
Pattern p = Pattern.compile("(\\d)\\s+(\\d)");
Matcher m = p.matcher(input);
while (m.find()) {
System.out.printf("Whole match: %s\n\tFirst digit: %s\n\tSecond digit: %s\n", m.group(), m.group(1), m.group(2));
}
Output
Whole match: 5 6
First digit: 5
Second digit: 6
The answer is of course m.find() sorry for being stupid this morning. Thanks to all who even looked at this :)

Java Regex for changing every ith index in every word of a string

I've written a regex \b\S\w(\S(?=.)) to find every third symbol in a word and replace it with '1'. Now I'm trying to use this expression but really don't know how to do it right.
Pattern pattern = Pattern.compile("\\b\\S\\w(\\S(?=.))");
Matcher matcher = pattern.matcher("lemon apple strawberry pumpkin");
while (matcher.find()) {
System.out.print(matcher.group(1) + " ");
}
So result is:
m p r m
And how can I use this to make a string like this
le1on ap1le st1awberry pu1pkin
You could use something like this:
"lemon apple strawberry pumpkin".replaceAll("(?<=\\b\\S{2})\\S", "1")
Would produce your example output. The regex would replace any non space character preceded by two non space characters and then a word boundary.
This means that "words" like 12345 would be changed into 12145 since 3 is matched by \\S (not space).
Edit:
Updated the regex to better cater to the revised question title, change 2 to i-1 to replace the ith letter of the word.
There is another way to access the index of the matcher
Like this:
Pattern pattern = Pattern.compile("\\b\\S\\w(\\S(?=.))");
String string = "lemon apple strawberry pumpkin";
char[] c = string.toCharArray();
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
c[matcher.end() - 1] = '1';////// may be it's not perfect , but this way in case of you want to access the index in which the **sring** is matches with the pattern
}
System.out.println(c);

Categories

Resources