Regular expression "\\d?" giving incorrect output [duplicate] - java

This question already has an answer here:
SCJP6 regex issue
(1 answer)
Closed 7 years ago.
Sample code
Pattern p = Pattern.compile("\\d?");
Matcher m = p.matcher("ab34ef");
boolean b = false;
while (m.find())
{
System.out.print(m.start());// + m.group());
}
Answer: 012456
But string total length is 6. So How m.start will give 6 in the output, as index starts
from 0.

\d? matches zero or one character, so it starts beyond the last character of the string as well, as a zero-width match.
Note that your output is not in fact attained by \d?, but by \d*. You should change either one or the other to make the question self-consistent.

\d? matches zero or one digit, which matches every digit, but also matches every character boundary.
Try matching at least one digit:
Pattern p = Pattern.compile("\\d+");

Related

kotlin/java match a number in a string with a regular expression [duplicate]

This question already has answers here:
How to extract numbers from a string and get an array of ints?
(13 answers)
Closed 1 year ago.
For example, if I have these strings, is there any way I can get 123 of all these strings, or 777 or 888?
https://www.example.com/any/123/ and
https://www.example.com/any/777/123/ and
https://www.example.com/any/777/123/888
What I mean is how to match the first or second or the third last number in the string.
You can use capture groups to solve this as
val strList = listOf("https://www.example.com/any/777/123/888", "https://www.example.com/any/123/", "https://www.example.com/any/777/123/")
val intList = mutableListOf<Int>()
val regex = Regex("/?(\\d+)")
strList.forEach { str ->
regex.findAll(str).forEach {
intList.add(it.groupValues[1].toInt())
}
}
Assuming the digits all follow a slash and nothing intervenes,
(?<=/)\d+(?=/\d+){0}$ parses the last number
(?<=/)\d+(?=/\d+){1}$ parses the second to last number
(?<=/)\d+(?=/\d+){2}$ parses the third to last,
etc.
With Java, You can make use of the Pattern and Matcher class from the java.util.regex package.
e.g for your case above, you want to match integers - use \d Predefined character class to match digits.
String str = "https://www.example.com/any/777/123/";
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher(str);
for(; matcher.find(); System.out.println(matcher.group()));
In the above you loop through the String finding matches, and printing each subsequent found match.

Extract numbers from a string based on a another string [duplicate]

This question already has answers here:
Java Regex group 0
(2 answers)
Closed 4 years ago.
I am trying to write a regex for a string which has a format [digit] [to] [digit] eg. 1 to 5 in which if I find a word "to" from a given string i want to extract the number before and after, I have tried this and it's not working.
Pattern p = Pattern.compile("([0-9]+)\\bto\\b([0-9]+)");
Matcher m = p.matcher("1 to 5");
m.find();
System.out.println(m.group(0));
System.out.println(m.group(1));
System.out.println(m.group(2));
Expected o/p
1
to
5
Consider adding a group for the to part.
Also for the space, you want \\s not \\b:
Pattern p = Pattern.compile("([0-9]+)\\s(to)\\s([0-9]+)");
Matcher m = p.matcher("1 to 5");
m.find();
System.out.println(m.group(1));
System.out.println(m.group(2));
System.out.println(m.group(3));
And as said in the comments :
" Group zero denotes the entire pattern"
Is it necessary that you must use regex. If not, you can use String functions.
String s="23 to 34";
String toString="to";
if(s.contains(toString)){
int startIndex=s.indexOf(toString);
int endIndex=startIndex+(toString).length();
String s1=s.substring(0, startIndex); //get the first number
String s2=s.substring(endIndex); //get the second number
System.out.println(s1.trim()); // Removing any whitespaces
System.out.println(toString);
System.out.println(s2.trim();
}

Java regex behaving wierd [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
I have the below test case,
#Test
public void test_check_pattern_match_caseInSensitive_for_pre_sampling_filename() {
// given
String pattern = "Sample*.*Selection*.*Preliminary";
// when
// then
assertThat(Util.checkPatternMatchCaseInSensitive(pattern, "Sample selectiossn preliminary"), is(false));
assertThat(Util.checkPatternMatchCaseInSensitive(pattern, "sample selection preliminary"), is(true));
}
The Util method is:
public static boolean checkPatternMatchCaseInSensitive(String pattern, String value) {
Pattern p = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = p.matcher(value);
if (matcher.find())
return true;
return false;
}
Can someone please help, why the regex Sample*.*Selection*.*Preliminary matches the fileName = Sample selectiossn preliminary ?
This test case should pass, but it fails because of the first assert. :S
The * in regex means 0 or more of the previous character, while . means any single character.
What your expression is looking for is:
Exactly Sampl
0 or more e
0 or more of any char
Exactly Selectio
0 or more n
0 or more of any char
And so on
The problem would fall under points 5 and 6:
No n was found under point 5, and ssn would match point 6
Selection* in regexp matches to "selectio".
.* matches to "ssn "
Preliminary matches to "preliminary"
Regexp n* mean zero or more n character.
Regexp . mean any character.
Regexp .* mean zero or more any character.
*.*
You have "Selection*.*", which means "Selectio", then any number (including zero) of letter "n", then any number (including zero) of any character.
The match assumes zero matches of "n" matching "", and four matches of any character matching "ssn ".

Need a Regex that extracts a string between two "delimiting" strings [duplicate]

This question already has answers here:
Java Regex Capturing Groups
(4 answers)
Closed 6 years ago.
I need to get the string between by_ and _on.
So far I have this, but don't understand how to truncate the actual "string delimiters":
by_(.*)_on
Sample input:
Files_by_wesasegeaazedude_on_January_26.jpg
Current Match:
by_wesasegeaazedude_on
Needed Match:
wesasegeaazedude
Your expression is good*. All you need to do is extracting the content of the first capturing group:
Pattern regex = Pattern.compile("by_(.*)_on");
String str = "Files_by_wesasegeaazedude_on_January_26.jpg";
Matcher m = regex.matcher(str);
if (m.find()) {
String res = m.group(1);
}
Demo.
* Well, almost good. If you expect inputs with multiple file names on the same line, you may want to consider using reluctant qualifier, i.e. by_(.*?)_on
I would do this without regular expressions.
int start = str.indexOf("by_");
int end = str.indexOf("_on", start + 1); // or lastIndexOf("_on"), for greedy match.
assert start > 0 && end > start;
String part = str.substring(start + 3, end);
You can simply use positive lookarounds:
String regex = "(?<=by_).*(?=_on)";
What this regex does is:
match anything: .*
that is preceded by by_: (?<=by_)
and followed by _on: (?=_on)

Parsing array syntax using regex

I think what I am asking is either very trivial or already asked, but I have had a hard time finding answers.
We need to capture the inner number characters between brackets within a given string.
so given the string
StringWithMultiArrayAccess[0][9][4][45][1]
and the regex
^\w*?(\[(\d+)\])+?
I would expect 6 capture groups and access to the inner data.
However, I end up only capturing the last "1" character in capture group 2.
If it is important heres my java junit test:
#Test
public void ensureThatJsonHandlerCanHandleNestedArrays(){
String stringWithArr = "StringWithMultiArray[0][0][4][45][1]";
Pattern pattern = Pattern.compile("^\\w*?(\\[(\\d+)\\])+?");
Matcher matcher = pattern.matcher(stringWithArr);
matcher.find();
assertTrue(matcher.matches()); //passes
System.out.println(matcher.group(2)); //prints 1 (matched from last array symbols)
assertEquals("0", matcher.group(2)); //expected but its 1 not zero
assertEquals("45", matcher.group(5)); //only 2 capture groups exist, the whole string and the 1 from the last array brackets
}
In order to capture each number, you need to change your regex so it (a) captures a single number and (b) is not anchored to--and therefore limited by--any other part of the string ("^\w*?" anchors it to the start of the string). Then you can loop through them:
Matcher mtchr = Pattern.compile("\\[(\\d+)\\]").matcher(arrayAsStr);
while(mtchr.find()) {
System.out.print(mtchr.group(1) + " ");
}
Output:
0 9 4 45 1

Categories

Resources