Java regex - one liner for counting matches

Java regex - one liner for counting matches - java

Is there a one liner to replace the while loop?
String formatSpecifier = "%(\\d+\\$)?([-#+ 0,(\\<]*)?(\\d+)?(\\.\\d+)?([tT])?([a-zA-Z%])";
Pattern pattern = Pattern.compile(formatSpecifier);
Matcher matcher = pattern.matcher("hello %s my name is %s");
// CAN BELOW BE REPLACED WITH A ONE LINER?
int counter = 0;
while (matcher.find()) {
counter++;
}

Personally I don't see any reason to aim for one-liner given the original code is already easy to understand. Anyway, several ways if you insists:
1. Make a helper method
make something like this
static int countMatches(Matcher matcher) {
int counter = 0;
while (matcher.find())
counter++;
return counter;
}
so your code become
int counter = countMatches(matcher);
2. Java 9
Matcher in Java 9 provides results() which returns a Stream<MatchResult>. So your code becomes
int counter = matcher.results().count();
3. String Replace
Similar to what the other answer suggest.
Here I am replacing with null character (which is almost not used in any normal string), and do the counting by split:
Your code become:
int counter = matcher.replaceAll("\0").split("\0", -1).length - 1;

Yes: replace any occurrence by a char that can be neither in the pattern nor in the string to match, then count the number of occurrences of this char.
Here I choose X, for the example to be simple. You should use a char not so often used (see UTF-8 special chars for instance).
final int counter = pattern.matcher("hello %s my name is %s").replaceAll("X").replaceAll("[^X]", "").length();
Value computed for counter is 2 with your example.

Related

Find out recursive pattern in a string java

In one of my interview I had asked one program on java string, I am unable to answer it. I don't know it is a simple program or complex one. I have explored on the internet for it, but unable to find the exact solution for it. My question is as follow,
I have supposed one string which contains recursive pattern like,
String str1 = "abcabcabc";
In above string recursive pattern is "abc" which repeated in one string, because this string only contains "abc" pattern recursively.
if I passed this string to a function/method as a parameter that function/method should return me "This string has a recursive pattern." If that string doesn't have any recursive pattern then simply function/method should return "This string doesn't contain the recursive pattern."
Following are probabilities,
String str1 = "abcabcabc";
//This string contains recursive pattern 'abc'
String str2 = "abcabcabcabdac";
//This string doesn't contains recursive pattern
String str2 = "abcddabcddabcddddabc";
//This string contains recursive pattern 'abc' & 'dd'
Can anybody suggest me solution/algorithm for this, I am struggling with it. What is the best way for different probabilities, so that I implement?

From LeetCode
public boolean repeatedSubstringPattern(String str) {
int l = str.length();
for(int i=l/2;i>=1;i--) {
if(l%i==0) {
int m = l/i;
String subS = str.substring(0,i);
StringBuilder sb = new StringBuilder();
for(int j=0;j<m;j++) {
sb.append(subS);
}
if(sb.toString().equals(str)) return true;
}
}
return false;
}
The length of the repeating substring must be a divisor of the length of the input string
Search for all possible divisor of str.length, starting for length/2
If i is a divisor of length, repeat the substring from 0 to i the number of times i is contained in s.length
If the repeated substring is equals to the input str return true

Solution is not in Javascript. However, problem looked interesting, so attempted to solve it in python. Apologies!
In python, I wrote a logic which worked [Could be written much better, thought the logic would help you]
Script is
def check(lst):
return all(x in lst[-1] for x in lst)
s = raw_input("Enter string:: ")
if check(sorted(s.split(s[0])[1:])):
print("String, {} is recursive".format(s))
else:
print("String, {} is NOT recursive".format(s))
Output of the script:
[mac] kgowda#blr-mp6xx:~/Desktop/my_work/play$ python dup.py
Enter string:: abcabcabcabdac
String, abcabcabcabdac is NOT recursive
[mac] kgowda#blr-mp6xx:~/Desktop/my_work/play$ python dup.py
Enter string:: abcabcabc
String, abcabcabc is recursive
[mac] kgowda#blr-mp6xx:~/Desktop/my_work/play$ python dup.py
Enter string:: abcddabcddabcddddabc
String, abcddabcddabcddddabc is recursive

This can also be solved using a part of the Knuth–Morris–Pratt Algorithm.
The idea is to build a 1-D array with each entry representing a character in the word. For each character i in the word we check if there is a prefix which is also a suffix in the word up 0 to i. The reason being if we have common suffix and prefix we can continue searching from the character after prefix ends which we update the array with the corresponding character index.
For s="abcababcababcab", the array will be
Index : 0 1 2 3 4 5 6 7 8
String: a b c a b c a b c
KMP : 0 0 0 1 2 3 4 5 6
For Index = 2, we see that there is no suffix which is also the prefix in the string ab i.e) up until Index = 2
For Index = 4, the suffix ab(Index = 3, 4) is same as the prefix ab(Index = 0, 1) so we update the KMP[4] = 2 which is the index of the pattern from which we have to resume searching.
Thus KMP[i] holds the index of the string s where prefix matches the longest possible suffix in the range 0 to i plus 1. Which essentially means that the a prefix with length index + 1 - KMP[index] exists in the string previously. using this information we can find out if all the substrings of that length are the same.
For Index = 8, we know KMP[index] = 6, which means there is a prefix(s[3] to s[5]) of length 9 - 6 = 3 which is equal to the suffix(s[6] to s[8]), If this is the only repetitive pattern we have this will follow
For a clearer explanation of this algorithm please check this video lecture.
This table can be build in linear time,
vector<int> buildKMPtable(string word)
{
vector<int> kmp(word.size());
int j=0;
for(int i=1; i < word.size(); ++i)
{
j = word[j] == word[i] ? j : kmp[j-1];
if(word[j] == word[i])
{
kmp[i] = j + 1;
++j;
}
else
{
kmp[i] = j;
}
}
return kmp;
}
bool repeatedSubstringPattern(string s) {
auto kmp = buildKMPtable(s);
if(kmp[s.size() -1] == 0) // Occurs when the string has no prefix with suffix ending at the last character of the string
{
return false;
}
int diff = s.size() - kmp[s.size() -1]; //Length of the repetitive pattern
if(s.size() % diff != 0) //Length of repetitive pattern must be a multiple of the size of the string
{
return false;
}
// Check if that repetitive pattern is the only repetitive pattern.
string word = s.substr(0, diff);
int w_size = word.size();
for(int i=0; i < w_size; ++i)
{
int j = i;
while(j < s.size())
{
if(word[i] == s[j])
{
j += w_size;
}
else
{
return false;
}
}
}
return true;
}

If you know the 'parts' in advance, then the answer could be Recursive regular expressions, it seems.
So for abcabcabc we need an expression like abc(?R)* where:
abc matches the literal characters
(?R) recurses the pattern
A * to match between zero and unlimited number of times
The third one is a little trickier. See this regex101 link but it looks like:
((abc)|(dd))(?R)*
where we have either 'abc' or 'dd' and there are any number of these.
Otherwise, I don't see how you could determine from just a string that it has some undefined recursive structure like this.

How to find first occurance of whitespace(tab+space+etc) in java?

So I have something like this
System.out.println(some_string.indexOf("\\s+"));
this gives me -1
but when I do with specific value like \t or space
System.out.println(some_string.indexOf("\t"));
I get the correct index.
Is there any way I can get the index of the first occurrence of whitespace without using split, as my string is very long.
PS - if it helps, here is my requirement. I want the first number in the string which is separated from the rest of the string by a tab or space ,and i am trying to avoid split("\\s+")[0]. The string starts with that number and has a space or tab after the number ends

The point is: indexOf() takes a char, or a string; but not a regular expression.
Thus:
String input = "a\tb";
System.out.println(input);
System.out.println(input.indexOf('\t'));
prints 1 because there is a TAB char at index 1.
System.out.println(input.indexOf("\\s+"));
prints -1 because there is no substring \\s+ in your input value.
In other words: if you want to use the powers of regular expressions, you can't use indexOf(). You would be rather looking towards String.match() for example. But of course - that gives a boolean result; not an index.
If you intend to find the index of the first whitespace, you have to iterate the chars manually, like:
for (int index = 0; index < input.length(); index++) {
if (Character.isWhitespace(input.charAt(index))) {
return index;
}
}
return -1;

Something of this sort might help? Though there are better ways to do this.
class Sample{
public static void main(String[] args) {
String s = "1110 001";
int index = -1;
for(int i = 0; i < s.length(); i++ ){
if(Character.isWhitespace(s.charAt(i))){
index = i;
break;
}
}
System.out.println("Required Index : " + index);
}
}

Well, to find with a regular expression, you'll need to use the regular expression classes.
Pattern pat = Pattern.compile("\\s");
Matcher m = pat.matcher(s);
if ( m.find() ) {
System.out.println( "Found \\s at " + m.start());
}
The find method of the Matcher class locates the pattern in the string for which the matcher was created. If it succeeds, the start() method gives you the index of the first character of the match.
Note that you can compile the pattern only once (even create a constant). You just have to create a Matcher for every string.

Java: Counting frequency of a sequence in a string

I'm writing a simple program that counts how many times a sequence appears in a string.
Case 1:
Given string: EATNEATMMMMEAT
Given sequence: EAT
The program should return a value of 3.
Case 2:
Given string: EATEAT
Given sequence: EAT
The program should return 2.
import java.util.*;
public class FrequencyOfSequence { //Finds the frequency of a sequence in a string s
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
String s = in.nextLine();
String sequence = in.nextLine();
String[] cArr = s.split(sequence);
System.out.println(cArr.length);
}
}
My program works in case 1. It fails on the 2nd case because s.split(sequence) removes both 'EAT', leaving an array of size 0.
Is there a way around this?

Use Regex for this:
Pattern pattern = Pattern.compile(sequence);
Matcher matcher = pattern.matcher(s);
int count = 0;
while (matcher.find())
count++;
System.out.println(count);

One option is to use replace() to remove the matches and calculate the difference in size:
int count = (s.length() - s.replace(sequence, "").length()) / sequence.length();
If you want to use the split() method, it should work if you use it like this:
int count = s.split(sequence, -1).length - 1;
The -1 argument tells the method not to discard trailing empty strings. Then we subtract 1 from the length to avoid the fencepost problem.

Here is one more option you would like to try
int count=0;
if(s.endsWith(sequence)){
count=s.split(sequence).length;}
else{
count=s.split(sequence).length-1;
}

Count the borders between zeros inside a String in Java

I have a special kind of Strings in Java that has a sequence of zeros and some short sequence of characters between them like those:
"0000000000TT0000TU0000U0"
"0000000000TL"
"0000000000TL0000TM"
I want to count the number of sequences that are different from zeros.
for example:
"0000000000TT0000TU0000U0" will return 3
"0000000000TL" will return 1
"0000000000TL0000TM" will return 2
"000000" will return 0.
Any short and easy way to do it (maybe some Java String build option or regex of some kinde)?
Thanks

Use a negated character class to match any character but not of 0.
Matcher m = Pattern.compile("[^0]+").matcher(s);
int i = 0;
while(m.find()) {
i = i + 1;
}
System.out.println("Total count " + i);
DEMO

Java: Finding the number of word matches in a given string

I am trying to find the number of word matches for a given string and keyword combination, like this:
public int matches(String keyword, String text){
// ...
}
Example:
Given the following calls:
System.out.println(matches("t", "Today is really great, isn't that GREAT?"));
System.out.println(matches("great", "Today is really great, isn't that GREAT?"));
The result should be:
0
2
So far I found this: Find a complete word in a string java
This only returns if the given keyword exists but not how many occurrences. Also, I am not sure if it ignores case sensitivity (which is important for me).
Remember that substrings should be ignored! I only want full words to be found.
UPDATE
I forgot to mention that I also want keywords that are separated via whitespace to match.
E.g.
matches("today is", "Today is really great, isn't that GREAT?")
should return 1

Use a regular expression with word boundaries. It's by far the easiest choice.
int matches = 0;
Matcher matcher = Pattern.compile("\\bgreat\\b", Pattern.CASE_INSENSITIVE).matcher(text);
while (matcher.find()) matches++;
Your milage may vary on some foreign languages though.

How about taking advantage of indexOf ?
s1 = s1.toLowerCase(Locale.US);
s2 = s2.toLowerCase(Locale.US);
int count = 0;
int x;
int y = s2.length();
while((x=s1.indexOf(s2)) != -1){
count++;
s1 = s1.substr(x,x+y);
}
return count;
Efficient version
int count = 0;
int y = s2.length();
for(int i=0; i<=s1.length()-y; i++){
int lettersMatched = 0;
int j=0;
while(s1[i]==s2[j]){
j++;
i++;
lettersMatched++;
}
if(lettersMatched == y) count++;
}
return count;
For more efficient solution, you will have to modify KMP algorithm a little. Just google it, its simple.

well,you can use "split" to separate the words and find if there exists a word matches exactly.
hope that helps!

one option would be RegEx. Basically it sounds like you are looking to match a word with any punctuation on the left or right. so:
" great."
" great!"
" great "
" great,"
"Great"
would all match, but
"greatest"
wouldn't

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java regex - one liner for counting matches - java

Related

Find out recursive pattern in a string java

How to find first occurance of whitespace(tab+space+etc) in java?

Java: Counting frequency of a sequence in a string

Count the borders between zeros inside a String in Java

Java: Finding the number of word matches in a given string

Categories

Resources