How can I count the number of matches for a regex? - java

Let's say I have a string which contains this:
HelloxxxHelloxxxHello
I compile a pattern to look for 'Hello'
Pattern pattern = Pattern.compile("Hello");
Matcher matcher = pattern.matcher("HelloxxxHelloxxxHello");
It should find three matches. How can I get a count of how many matches there were?
I've tried various loops and using the matcher.groupCount() but it didn't work.

matcher.find() does not find all matches, only the next match.
Solution for Java 9+
long matches = matcher.results().count();
Solution for Java 8 and older
You'll have to do the following. (Starting from Java 9, there is a nicer solution)
int count = 0;
while (matcher.find())
count++;
Btw, matcher.groupCount() is something completely different.
Complete example:
import java.util.regex.*;
class Test {
public static void main(String[] args) {
String hello = "HelloxxxHelloxxxHello";
Pattern pattern = Pattern.compile("Hello");
Matcher matcher = pattern.matcher(hello);
int count = 0;
while (matcher.find())
count++;
System.out.println(count); // prints 3
}
}
Handling overlapping matches
When counting matches of aa in aaaa the above snippet will give you 2.
aaaa
aa
aa
To get 3 matches, i.e. this behavior:
aaaa
aa
aa
aa
You have to search for a match at index <start of last match> + 1 as follows:
String hello = "aaaa";
Pattern pattern = Pattern.compile("aa");
Matcher matcher = pattern.matcher(hello);
int count = 0;
int i = 0;
while (matcher.find(i)) {
count++;
i = matcher.start() + 1;
}
System.out.println(count); // prints 3

This should work for matches that might overlap:
public static void main(String[] args) {
String input = "aaaaaaaa";
String regex = "aa";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
int from = 0;
int count = 0;
while(matcher.find(from)) {
count++;
from = matcher.start() + 1;
}
System.out.println(count);
}

From Java 9, you can use the stream provided by Matcher.results()
long matches = matcher.results().count();

If you want to use Java 8 streams and are allergic to while loops, you could try this:
public static int countPattern(String references, Pattern referencePattern) {
Matcher matcher = referencePattern.matcher(references);
return Stream.iterate(0, i -> i + 1)
.filter(i -> !matcher.find())
.findFirst()
.get();
}
Disclaimer: this only works for disjoint matches.
Example:
public static void main(String[] args) throws ParseException {
Pattern referencePattern = Pattern.compile("PASSENGER:\\d+");
System.out.println(countPattern("[ \"PASSENGER:1\", \"PASSENGER:2\", \"AIR:1\", \"AIR:2\", \"FOP:2\" ]", referencePattern));
System.out.println(countPattern("[ \"AIR:1\", \"AIR:2\", \"FOP:2\" ]", referencePattern));
System.out.println(countPattern("[ \"AIR:1\", \"AIR:2\", \"FOP:2\", \"PASSENGER:1\" ]", referencePattern));
System.out.println(countPattern("[ ]", referencePattern));
}
This prints out:
2
0
1
0
This is a solution for disjoint matches with streams:
public static int countPattern(String references, Pattern referencePattern) {
return StreamSupport.stream(Spliterators.spliteratorUnknownSize(
new Iterator<Integer>() {
Matcher matcher = referencePattern.matcher(references);
int from = 0;
#Override
public boolean hasNext() {
return matcher.find(from);
}
#Override
public Integer next() {
from = matcher.start() + 1;
return 1;
}
},
Spliterator.IMMUTABLE), false).reduce(0, (a, c) -> a + c);
}

Use the below code to find the count of number of matches that the regex finds in your input
Pattern p = Pattern.compile(regex, Pattern.MULTILINE | Pattern.DOTALL);// "regex" here indicates your predefined regex.
Matcher m = p.matcher(pattern); // "pattern" indicates your string to match the pattern against with
boolean b = m.matches();
if(b)
count++;
while (m.find())
count++;
This is a generalized code not specific one though, tailor it to suit your need
Please feel free to correct me if there is any mistake.

Related

how to find the number of a specific pattern appeared in an other pattern using regular expression?

I'm trying to solve a problem using regular expression where I need to find the replicated pattern in another one but i have a problem in the interleaved pattern
for example:
1010 and 10101010
the answer must be 3 but it gives me 2
int count=0;
Pattern expression = Pattern.compile(s);
Matcher matcher = expression.matcher(scanner.next());
while(matcher.find())
{count++;}
System.out.println(count);
Once the variable is found in the first one it searches from there and on,
so it gett's complicated to actually find all matches in your way.
Here is a recursive solution I've modified from old home work project,
probably not optimised but it works.
public static void main(String[] args)
{
String str1 = "1010";
String str2 = "10101010";
int len = str2.length();
System.out.println(match(str1,str2,len,len,0));
}
static int match(String str1,String str2,int len,int j,int times)
{
if(j == 0)
return times;
Pattern p = Pattern.compile(str1);
Matcher m = p.matcher(str2);
int n = 0;
if(m.find()){
n = m.start()+1;
times++;
}
return match(str1,str2.substring(n,str2.length()),str2.length(),--j,times);
}

Pattern matching to match longest substring

I have this regex D+U.
It should match once for the following String UDDDUDUU but with Java it matches three times. It matches DDDU DU. I am using https://regex101.com/ to check my regex, and it should only match once, the DDDU.
I am trying to solve this HackerRank challenge. I am also trying to use Pattern's because I want to practice using those classes.
What exactly am I doing wrong?
This is my code:
static int match(int n, String s) {
Matcher matcher = Pattern.compile("D+U").matcher(s);
int count = 0;
int i = 0;
while (matcher.find(i)) {
count++;
i = matcher.end() + 1;
}
return count;
}
The regex + match one or more of the preceding character/regular expression. So this will match any sequence of D and the U.
If you want to return the longest match you could do:
static String match(String s) {
ArrayList<String> matches = new ArrayList<>();
Matcher matcher = Pattern.compile("D+U").matcher(s);
int i = 0;
while (matcher.find(i)) {
matches.add(matcher.group());
i = matcher.end();
}
return Collections.max(matches, Comparator.comparing(c -> c.length()));
}
Which (with the test case of UDDDUDUU) returns DDDU. Also note that I removed the parameter of n as you never used it

Count & Split by regex pattern in java

I have a string in below format.
-52/ABC/35/BY/200/L/DEF/307/C/110/L
I need to perform the following.
1. Find the no of occurrences of 3 letter word's like ABC,DEF in the above text.
2. Split the above string by ABC and DEF as shown below.
ABC/35/BY/200/L
DEF/307/C/110/L
I have tried using regex with below code, but it always shows the match count is zero. How to approach this easily.
static String DEST_STRING = "^[A-Z]{3}$";
static Pattern DEST_PATTERN = Pattern.compile(DEST_STRING,
Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
public static void main(String[] args) {
String test = "-52/ABC/35/BY/200/L/DEF/307/C/110/L";
Matcher destMatcher = DEST_PATTERN.matcher(test);
int destCount = 0;
while (destMatcher.find()) {
destCount++;
}
System.out.println(destCount);
}
Please note i need to use JDK 6 for this,
You can use this code :
public static void main(String[] args) throws Exception {
String s = "-52/ABC/35/BY/200/L/DEF/307/C/110/L";
// Pattern to find all 3 letter words . The \\b means "word boundary", which ensures that the words are of length 3 only.
Pattern p = Pattern.compile("(\\b[a-zA-Z]{3}\\b)");
Matcher m = p.matcher(s);
Map<String, Integer> countMap = new HashMap<>();
// COunt how many times each 3 letter word is used.
// Find each 3 letter word.
while (m.find()) {
// Get the 3 letter word.
String val = m.group();
// If the word is present in the map, get old count and add 1, else add new entry in map and set count to 1
if (countMap.containsKey(val)) {
countMap.put(val, countMap.get(val) + 1);
} else {
countMap.put(val, 1);
}
}
System.out.println(countMap);
// Get ABC.. and DEF.. using positive lookahead for a 3 letter word or end of String
// Finds and selects everything starting from a 3 letter word until another 3 letter word is found or until string end is found.
p = Pattern.compile("(\\b[a-zA-Z]{3}\\b.*?)(?=/[A-Za-z]{3}|$)");
m = p.matcher(s);
while (m.find()) {
String val = m.group();
System.out.println(val);
}
}
O/P :
{ABC=1, DEF=1}
ABC/35/BY/200/L
DEF/307/C/110/L
Check this one:
String stringToSearch = "-52/ABC/35/BY/200/L/DEF/307/C/110/L";
Pattern p1 = Pattern.compile("\\b[a-zA-Z]{3}\\b");
Matcher m = p1.matcher(stringToSearch);
int startIndex = -1;
while (m.find())
{
//Try to use Apache Commons' StringUtils
int count = StringUtils.countMatches(stringToSearch, m.group());
System.out.println(m.group +":"+ count);
if(startIndex != -1){
System.out.println(stringToSearch.substring(startIndex,m.start()-1));
}
startIndex = m.start();
}
if(startIndex != -1){
System.out.println(stringToSearch.substring(startIndex));
}
output:
ABC : 1
ABC/35/BY/200/L
DEF : 1
DEF/307/C/110/L

Finding the longest "number sequence" in a string using only a single regex

I want to find a single regex which matches the longest numerical string in a URL.
I.e for the URL: http://stackoverflow.com/1234/questions/123456789/ask, I would like it to return : 123456789
I thought I could use : ([\d]+)
However this returns the first match from the left, not the longest.
Any ideas :) ?
This regex will be used as an input to a strategy pattern, which extracts certain characteristics from urls:
public static String parse(String url, String RegEx) {
Pattern pattern = Pattern.compile(regex);
Matcher m = pattern.matcher(url);
if (m.find()) {
return m.group(1);
}
return null;
}
So it would be much tidier if I could use a single regex. :( –
Don't use regex. Just iterate the characters:
String longest = 0;
int i = 0;
while (i < str.length()) {
while (i < str.length() && !Character.isDigit(str.charAt(i))) {
++i;
}
int start = i;
while (i < str.length() && Character.isDigit(str.charAt(i))) {
++i;
}
if (i - start > longest.length()) {
longest = str.substring(start, i);
}
}
#Andy already gave a non-regex answer, which is probably faster, but if you want to use regex, you must, as #Jan points out, add logic, e.g.:
public String findLongestNumber(String input) {
String longestMatch = "";
int maxLength = 0;
Matcher m = Pattern.compile("([\\d]+)").matcher(input);
while (m.find()) {
String currentMatch = m.group();
int currentLength = currentMatch.length();
if (currentLength > maxLength) {
maxLength = currentLength;
longestMatch = currentMatch;
}
}
return longestMatch;
}
t
Not possible with pure Regex, however I would do it this way (using Stream Max and Regex) :
String url = "http://stackoverflow.com/1234/questions/123456789/ask";
Pattern biggest = Pattern.compile("/(\\d+)/");
Matcher m = biggest.matcher(url);
List<String> matches = new ArrayList<>();
while(m.find()){
matches.add(m.group(1));
}
System.out.println(matches.parallelStream().max((String a, String b) -> Integer.compare(a.length(), b.length())).get());
Will print : 123456789

how to find the No. of matches if i use patterns and matcher in java?

i have the following code
Pattern keyPattern = Pattern.compile(key);
Matcher matcher = keyPattern.matcher(str);
return matcher.replaceAll(value);
this will replaces the key in the str with value;
but i want to know how many instances of key's has been replaced with value.
so how to know that?
You can use the find method in a loop to count. There is an example in the Java tutorial:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class MatcherDemo {
private static final String REGEX = "\\bdog\\b";
private static final String INPUT = "dog dog dog doggie dogg";
public static void main(String[] args) {
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT); // get a matcher object
int count = 0;
while(m.find()) {
count++;
System.out.println("Match number "+count);
System.out.println("start(): "+m.start());
System.out.println("end(): "+m.end());
}
}
}
While you are calling find, you can at the same time collect the parts of the string and manually build the result in a StringBuffer. Or if performance is not an issue, you can first count and then afterwards scan the string again with replaceAll.
instead you could do:
int num = 0;
Pattern keyPattern = Pattern.compile(key);
Matcher matcher = keyPattern.matcher(str);
while(matcher.find()){
str = matcher.replaceFirst();
matcher = keyPattern.matcher(str); //don't know if this line is necessary
num++;
}

Categories

Resources