Java: Counting frequency of a sequence in a string

Java: Counting frequency of a sequence in a string - java

I'm writing a simple program that counts how many times a sequence appears in a string.
Case 1:
Given string: EATNEATMMMMEAT
Given sequence: EAT
The program should return a value of 3.
Case 2:
Given string: EATEAT
Given sequence: EAT
The program should return 2.
import java.util.*;
public class FrequencyOfSequence { //Finds the frequency of a sequence in a string s
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
String s = in.nextLine();
String sequence = in.nextLine();
String[] cArr = s.split(sequence);
System.out.println(cArr.length);
}
}
My program works in case 1. It fails on the 2nd case because s.split(sequence) removes both 'EAT', leaving an array of size 0.
Is there a way around this?

Use Regex for this:
Pattern pattern = Pattern.compile(sequence);
Matcher matcher = pattern.matcher(s);
int count = 0;
while (matcher.find())
count++;
System.out.println(count);

One option is to use replace() to remove the matches and calculate the difference in size:
int count = (s.length() - s.replace(sequence, "").length()) / sequence.length();
If you want to use the split() method, it should work if you use it like this:
int count = s.split(sequence, -1).length - 1;
The -1 argument tells the method not to discard trailing empty strings. Then we subtract 1 from the length to avoid the fencepost problem.

Here is one more option you would like to try
int count=0;
if(s.endsWith(sequence)){
count=s.split(sequence).length;}
else{
count=s.split(sequence).length-1;
}

Related

How to find first occurance of whitespace(tab+space+etc) in java?

So I have something like this
System.out.println(some_string.indexOf("\\s+"));
this gives me -1
but when I do with specific value like \t or space
System.out.println(some_string.indexOf("\t"));
I get the correct index.
Is there any way I can get the index of the first occurrence of whitespace without using split, as my string is very long.
PS - if it helps, here is my requirement. I want the first number in the string which is separated from the rest of the string by a tab or space ,and i am trying to avoid split("\\s+")[0]. The string starts with that number and has a space or tab after the number ends

The point is: indexOf() takes a char, or a string; but not a regular expression.
Thus:
String input = "a\tb";
System.out.println(input);
System.out.println(input.indexOf('\t'));
prints 1 because there is a TAB char at index 1.
System.out.println(input.indexOf("\\s+"));
prints -1 because there is no substring \\s+ in your input value.
In other words: if you want to use the powers of regular expressions, you can't use indexOf(). You would be rather looking towards String.match() for example. But of course - that gives a boolean result; not an index.
If you intend to find the index of the first whitespace, you have to iterate the chars manually, like:
for (int index = 0; index < input.length(); index++) {
if (Character.isWhitespace(input.charAt(index))) {
return index;
}
}
return -1;

Something of this sort might help? Though there are better ways to do this.
class Sample{
public static void main(String[] args) {
String s = "1110 001";
int index = -1;
for(int i = 0; i < s.length(); i++ ){
if(Character.isWhitespace(s.charAt(i))){
index = i;
break;
}
}
System.out.println("Required Index : " + index);
}
}

Well, to find with a regular expression, you'll need to use the regular expression classes.
Pattern pat = Pattern.compile("\\s");
Matcher m = pat.matcher(s);
if ( m.find() ) {
System.out.println( "Found \\s at " + m.start());
}
The find method of the Matcher class locates the pattern in the string for which the matcher was created. If it succeeds, the start() method gives you the index of the first character of the match.
Note that you can compile the pattern only once (even create a constant). You just have to create a Matcher for every string.

Search a keyword from an array in an input string and Print them

I have been doing this since almost past few days but still unable to get a required output.
Well I have an array say
wordlist[]={"One","Two","Three","Four","Five"};
and then i take input from the user.
String input="I have three no, four strings";
Now what i want to do is perform a search operation on the string to check for the words available in the array wordlist[];
Like in the above example input string contains the words three and four that are present in the array.
so it should be able to print those words from array available in the string, and if no words from the wordlist[] are available then it should print "No Match Found".
Here's My code i'm struck with this.
Please
import java.util.regex.*;
import java.io.*;
class StringSearch{
public static void main(String ...v)throws IOException{
BufferedReader cin = new BufferedReader(new InputStreamReader(System.in));
String wordlist[]={"one","two","three","four","five"};
String input=cin.readLine();
int i,j;
boolean found;
Pattern pat;
Matcher mat;
Pattern spliter=Pattern.compile("[ ,.!]");
String ip[]=spliter.split(input);
System.out.println(ip[2]);
for(i=0; i<wordlist.length;i++){
for(j=0;j<ip.length;j++){
pat=Pattern.compile("\b"+ip[j]+"\b");
mat=pat.matcher(wordlist[i]);
if(){
// No Idea What to write here
}
}
}
}
}

You need to use matches with condition input.matches(".*\\b"+wordlist[i]+"\\b.*")
.* : match anything
\\b: word boundary to avoid matching four with fourteen
and wordlist[i] is your word
1.) Traverse your array using loop
2.) Pick words from array and use matches with given regex to avoid matching four with fourteen
String wordlist[]={"one","two","three","four","five"};
String input="I have three no, fourteen strings";
int i;
boolean found=false;
// Traverse your array
for(i=0; i<wordlist.length;i++){
// match your regex containing words from array against input
if(input.matches(".*\\b"+wordlist[i]+"\\b.*")){
// set found = true
found=true;
// display found matches
System.out.println(wordlist[i]);
}
}
// if found is false here then mean there was no match
if (!found) {
System.out.println("No Match Found");
}
Output :
three

Using Java8 Streams you can do:
...
import java.util.Arrays;
import java.util.stream.Collectors;
...
String wordlist[]={"one","two","three","four","five"};
String input=cin.readLine();
String foundStrings =
Arrays.stream(wordlist)
.filter(s->input.matches(".*\\b"+s+"\\b.*"))
.collect(Collectors.joining("\n"));
System.out.print(foundStrings.isEmpty() ? "No Match Found\n": foundStrings);

Here preparing a regex : \\b(one|two|three|four|five)\\b and checking the count of matcher.
String wordlist[]={"one","two","three","four","five"};
String input="I have three no, fourteen strings";
StringBuilder regexBuilder = new StringBuilder("\\b").append("(").append(String.join("|", wordlist)).append(")").append("\\b");
String regexExp = regexBuilder.toString();
regexBuilder.setLength(0);
Pattern p = Pattern.compile(regexExp);
Matcher matcher = p.matcher(input);
int count = 0;
while (matcher.find())
{
System.out.println(matcher.group());
count++;
}
if( count == 0){
System.out.println("No Match Found");
}

Java Get first character values for a string

I have inputs like
AS23456SDE
MFD324FR
I need to get First Character values like
AS, MFD
There should no first two or first 3 characters input can be changed. Need to get first characters before a number.
Thank you.
Edit : This is what I have tried.
public static String getPrefix(String serial) {
StringBuilder prefix = new StringBuilder();
for(char c : serial.toCharArray()){
if(Character.isDigit(c)){
break;
}
else{
prefix.append(c);
}
}
return prefix.toString();
}

Here is a nice one line solution. It uses a regex to match the first non numeric characters in the string, and then replaces the input string with this match.
public String getFirstLetters(String input) {
return new String("A" + input).replaceAll("^([^\\d]+)(.*)$", "$1")
.substring(1);
}
System.out.println(getFirstLetters("AS23456SDE"));
System.out.println(getFirstLetters("1AS123"));
Output:
AS
(empty)

A simple solution could be like this:
public static void main (String[]args) {
String str = "MFD324FR";
char[] characters = str.toCharArray();
for(char c : characters){
if(Character.isDigit(c))
break;
else
System.out.print(c);
}
}

Use the following function to get required output
public String getFirstChars(String str){
int zeroAscii = '0'; int nineAscii = '9';
String result = "";
for (int i=0; i< str.lenght(); i++){
int ascii = str.toCharArray()[i];
if(ascii >= zeroAscii && ascii <= nineAscii){
result = result + str.toCharArray()[i];
}else{
return result;
}
}
return str;
}
pass your string as argument

I think this can be done by a simple regex which matches digits and java's string split function. This Regex based approach will be more efficient than the methods using more complicated regexs.
Something as below will work
String inp = "ABC345.";
String beginningChars = inp.split("[\\d]+",2)[0];
System.out.println(beginningChars); // only if you want to print.
The regex I used "[\\d]+" is escaped for java already.
What it does?
It matches one or more digits (d). d matches digits of any language in unicode, (so it matches japanese and arabian numbers as well)
What does String beginningChars = inp.split("[\\d]+",2)[0] do?
It applies this regex and separates the string into string arrays where ever a match is found. The [0] at the end selects the first result from that array, since you wanted the starting chars.
What is the second parameter to .split(regex,int) which I supplied as 2?
This is the Limit parameter. This means that the regex will be applied on the string till 1 match is found. Once 1 match is found the string is not processed anymore.
From the Strings javadoc page:
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
This will be efficient if your string is huge.
Possible other regex if you want to split only on english numerals
"[0-9]+"

public static void main(String[] args) {
String testString = "MFD324FR";
int index = 0;
for (Character i : testString.toCharArray()) {
if (Character.isDigit(i))
break;
index++;
}
System.out.println(testString.substring(0, index));
}
this prints the first 'n' characters before it encounters a digit (i.e. integer).

pattern matching using regular expressions replace by digits

my program is to take a big string from the user like aaaabaaaaaba
then the output should be replace aaa by 0 and aba by 1 in the given pattern of
string it should not be take a sequence one into the other every sequence is
individual and like aaaabaaabaaaaba here aaa-aba-aab-aaa-aba are individual and
should not overlap eachother while matching please help me to get this program
example: aaaabaaaaaba input ended output is 0101
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Pattern1 {
Scanner sc =new Scanner(System.in);
public void m1()
{ String s;
System.out.println("enter a string");
s=sc.nextLine();
assertTrue(s!=null);
Pattern p = Pattern.compile(s);
Matcher m =p.matcher(".(aaa");
Matcher m1 =p.matcher("aba");
while(m.find())
{
s.replaceAll(s, "1");
}
while(m1.find())
{
s.replaceAll(s, "0");
}
System.out.println(s);
}
private boolean assertTrue(boolean b) {
return b;
// TODO Auto-generated method stub
}
public static void main(String[] args) {
Pattern1 p = new Pattern1();
p.m1();
}
}

With regex and find you can search for each successive match and then add a 0 or 1 depending on the characters to the output.
String test = "aaaabaaaaabaaaa";
Pattern compile = Pattern.compile("(?<triplet>(aaa)|(aba))");
Matcher matcher = compile.matcher(test);
StringBuilder out = new StringBuilder();
int start = 0;
while (matcher.find(start)) {
String triplet = matcher.group("triplet");
switch (triplet) {
case "aaa":
out.append("0");
break;
case "aba":
out.append("1");
break;
}
start = matcher.end();
}
System.out.println(out.toString());
If you have "aaaaaba" (one a too much in the first triplet) as input, it will ignore the last "a" and output "01". So any invalid characters between valid triplets will be ignored.
If you want to go through the string blocks of 3 you can use a for-loop and the substring() function like this:
String test = "aaaabaaaaabaaaa";
StringBuilder out = new StringBuilder();
for (int i = 0; i < test.length() - 2; i += 3) {
String triplet = test.substring(i, i + 3);
switch (triplet) {
case "aaa":
out.append("0");
break;
case "aba":
out.append("1");
break;
}
}
System.out.println(out.toString());
In this case, if a triplet is invalid, it will just be ignored and neither a "0" nor a "1" will be added to the output. If you want to do something in this case, just add a default clause to the switch statement.

Here's what I understand from your question:
The user string will be some sequence of the tokens "aaa" and "aba"
There will be no other combinations of 'a' and 'b'. For example, you will not get "aaabaa" as an input string as "baa" is invalid..
For each consecutive 3 character string, replace "aaa" with 0 and "aba" with 1.
I'm guessing that this is a homework assignment designed to teach you about the dangers of catastrophic backtracking and how to carefully use quantifiers.
My suggestion would be to do this in two parts:
Identify and replace each 3-letter segment with a single character.
Replace those characters with the appropriate value. ('1' or '0')
For example, first construct a pattern like a([ab])a to capture the character ('a' or 'b') between two 'a's. Then, use the Matcher class' replaceAll method to replace each match with the captured character. So, for input aaaabaaaaaba' you getabab` as a result. Finally, replace all 'a' with '0' and all 'b' with '1'.
In Java:
// Create the matcher to identify triplets in the form "aaa" or "aba"
Matcher tripletMatcher = Pattern.compile("a([ab])a").matcher(inputString);
// Replace each triplet with the middle letter, then replace 'a' and 'b' properly.
String result = tripletMatcher.replaceAll("$1").replace('a', '0').replace('b', '1');
There's better ways of doing this, of course, but this should work. I've left the code intentionally dense and hard to read quickly. So, if this is a homework assignment, make sure you understand it fully and then rewrite it yourself.
Also, keep in mind that this will not work if the input string that isn't a sequence of "aaa" and "aba". Any other combination, such as "baa" or "abb", will cause errors. For example, ababaa, aababa, and aaabab will all result in unexpected and potentially incorrect results.

Not getting desired results with multiple regex matching in same string

I have a unique problem statement where I have to perform regex on an input string using triple characters. e.g. if my input is ABCDEFGHI, a pattern search for BCD should return false since I am treating my input as ABC+DEF+GHI and need to compare my regex pattern with these triple characters.
Similarly, regex pattern DEF will return true since it matches one of the triplets. Using this problem statement, assume that my input is QWEABCPOIUYTREWXYZASDFGHJKLABCMNBVCXZASXYZFGH and I am trying to get all output strings that start with triplet ABC and end with XYZ. So, in above input, my outputs should be two strings: ABCPOIUYTREWXYZ and ABCMNBVCXZASXYZ.
Also, I have to store these strings in an ArrayList. Below is my function:
public static void newFindMatches (String text, String startRegex, String endRegex, List<String> output) {
int startPos = 0;
int endPos = 0;
int i = 0;
// Making sure that substrings are always valid
while ( i < text.length()-2) {
// Substring for comparing triplets
String subText = text.substring(i, i+3);
Pattern startP = Pattern.compile(startRegex);
Pattern endP = Pattern.compile(endRegex);
Matcher startM = startP.matcher(subText);
if (startM.find()) {
// If a match is found, set the start position
startPos = i;
for (int j = i; j < text.length()-2; j+=3) {
String subText2 = text.substring(j, j+3);
Matcher endM = endP.matcher(subText2);
if (endM.find()) {
// If match for end pattern is found, set the end position
endPos = j+3;
// Add the string between start and end positions to ArrayList
output.add(text.substring(startPos, endPos));
i = j;
}
}
}
i = i+3;
}
}
Upon running this function in main as follows:
String input = "QWEABCPOIUYTREWXYZASDFGHJKLABCMNBVCXZASXYZFGH";
String start = "ABC";
String end = "XYZ";
List<String> results = new ArrayList<String> ();
newFindMatches(input, start, end, results);
for (int x = 0; x < results.size(); x++) {
System.out.println("Output String number "+(x+1)+" is: "+results.get(x));
}
I get the following output:
Output String number 1 is: ABCPOIUYTREWXYZ
Output String number 2 is: ABCPOIUYTREWXYZASDFGHJKLABCMNBVCXZASXYZ
Notice that first string is correct. However, for the second string, program is again reading from start of input string. Instead, i want the program to read after the last end pattern (i.e. skip the first search and unwanted characters such as ASDFGHJKL and should only print 2nd string as: ABCMNBVCXZASXYZ
Thanks for your responses

The problem here is that when you find your end match (the if statement within the for loop), you don't stop the for loop. So it just keeps looking for more end-matches until it hits the for-loop end condition j < text.length()-2. When you find your match and process it, you should end the loop using "break;". Place "break;" after the i=j line.
Note that technically the second answer your current program gave you is correct, that is also a substring that begins with ABC and ends with XYZ. You might want to rethink the correct output for your program. You could accommodate that situation by not setting i=j when you find a match, so that the only incrementing of i is the i=i+3 line, iterating across the triplets (and not adding the break).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.