Find out recursive pattern in a string java - java

In one of my interview I had asked one program on java string, I am unable to answer it. I don't know it is a simple program or complex one. I have explored on the internet for it, but unable to find the exact solution for it. My question is as follow,
I have supposed one string which contains recursive pattern like,
String str1 = "abcabcabc";
In above string recursive pattern is "abc" which repeated in one string, because this string only contains "abc" pattern recursively.
if I passed this string to a function/method as a parameter that function/method should return me "This string has a recursive pattern." If that string doesn't have any recursive pattern then simply function/method should return "This string doesn't contain the recursive pattern."
Following are probabilities,
String str1 = "abcabcabc";
//This string contains recursive pattern 'abc'
String str2 = "abcabcabcabdac";
//This string doesn't contains recursive pattern
String str2 = "abcddabcddabcddddabc";
//This string contains recursive pattern 'abc' & 'dd'
Can anybody suggest me solution/algorithm for this, I am struggling with it. What is the best way for different probabilities, so that I implement?

From LeetCode
public boolean repeatedSubstringPattern(String str) {
int l = str.length();
for(int i=l/2;i>=1;i--) {
if(l%i==0) {
int m = l/i;
String subS = str.substring(0,i);
StringBuilder sb = new StringBuilder();
for(int j=0;j<m;j++) {
sb.append(subS);
}
if(sb.toString().equals(str)) return true;
}
}
return false;
}
The length of the repeating substring must be a divisor of the length of the input string
Search for all possible divisor of str.length, starting for length/2
If i is a divisor of length, repeat the substring from 0 to i the number of times i is contained in s.length
If the repeated substring is equals to the input str return true

Solution is not in Javascript. However, problem looked interesting, so attempted to solve it in python. Apologies!
In python, I wrote a logic which worked [Could be written much better, thought the logic would help you]
Script is
def check(lst):
return all(x in lst[-1] for x in lst)
s = raw_input("Enter string:: ")
if check(sorted(s.split(s[0])[1:])):
print("String, {} is recursive".format(s))
else:
print("String, {} is NOT recursive".format(s))
Output of the script:
[mac] kgowda#blr-mp6xx:~/Desktop/my_work/play$ python dup.py
Enter string:: abcabcabcabdac
String, abcabcabcabdac is NOT recursive
[mac] kgowda#blr-mp6xx:~/Desktop/my_work/play$ python dup.py
Enter string:: abcabcabc
String, abcabcabc is recursive
[mac] kgowda#blr-mp6xx:~/Desktop/my_work/play$ python dup.py
Enter string:: abcddabcddabcddddabc
String, abcddabcddabcddddabc is recursive

This can also be solved using a part of the Knuth–Morris–Pratt Algorithm.
The idea is to build a 1-D array with each entry representing a character in the word. For each character i in the word we check if there is a prefix which is also a suffix in the word up 0 to i. The reason being if we have common suffix and prefix we can continue searching from the character after prefix ends which we update the array with the corresponding character index.
For s="abcababcababcab", the array will be
Index : 0 1 2 3 4 5 6 7 8
String: a b c a b c a b c
KMP : 0 0 0 1 2 3 4 5 6
For Index = 2, we see that there is no suffix which is also the prefix in the string ab i.e) up until Index = 2
For Index = 4, the suffix ab(Index = 3, 4) is same as the prefix ab(Index = 0, 1) so we update the KMP[4] = 2 which is the index of the pattern from which we have to resume searching.
Thus KMP[i] holds the index of the string s where prefix matches the longest possible suffix in the range 0 to i plus 1. Which essentially means that the a prefix with length index + 1 - KMP[index] exists in the string previously. using this information we can find out if all the substrings of that length are the same.
For Index = 8, we know KMP[index] = 6, which means there is a prefix(s[3] to s[5]) of length 9 - 6 = 3 which is equal to the suffix(s[6] to s[8]), If this is the only repetitive pattern we have this will follow
For a clearer explanation of this algorithm please check this video lecture.
This table can be build in linear time,
vector<int> buildKMPtable(string word)
{
vector<int> kmp(word.size());
int j=0;
for(int i=1; i < word.size(); ++i)
{
j = word[j] == word[i] ? j : kmp[j-1];
if(word[j] == word[i])
{
kmp[i] = j + 1;
++j;
}
else
{
kmp[i] = j;
}
}
return kmp;
}
bool repeatedSubstringPattern(string s) {
auto kmp = buildKMPtable(s);
if(kmp[s.size() -1] == 0) // Occurs when the string has no prefix with suffix ending at the last character of the string
{
return false;
}
int diff = s.size() - kmp[s.size() -1]; //Length of the repetitive pattern
if(s.size() % diff != 0) //Length of repetitive pattern must be a multiple of the size of the string
{
return false;
}
// Check if that repetitive pattern is the only repetitive pattern.
string word = s.substr(0, diff);
int w_size = word.size();
for(int i=0; i < w_size; ++i)
{
int j = i;
while(j < s.size())
{
if(word[i] == s[j])
{
j += w_size;
}
else
{
return false;
}
}
}
return true;
}

If you know the 'parts' in advance, then the answer could be Recursive regular expressions, it seems.
So for abcabcabc we need an expression like abc(?R)* where:
abc matches the literal characters
(?R) recurses the pattern
A * to match between zero and unlimited number of times
The third one is a little trickier. See this regex101 link but it looks like:
((abc)|(dd))(?R)*
where we have either 'abc' or 'dd' and there are any number of these.
Otherwise, I don't see how you could determine from just a string that it has some undefined recursive structure like this.

Related

How to find first occurance of whitespace(tab+space+etc) in java?

So I have something like this
System.out.println(some_string.indexOf("\\s+"));
this gives me -1
but when I do with specific value like \t or space
System.out.println(some_string.indexOf("\t"));
I get the correct index.
Is there any way I can get the index of the first occurrence of whitespace without using split, as my string is very long.
PS - if it helps, here is my requirement. I want the first number in the string which is separated from the rest of the string by a tab or space ,and i am trying to avoid split("\\s+")[0]. The string starts with that number and has a space or tab after the number ends
The point is: indexOf() takes a char, or a string; but not a regular expression.
Thus:
String input = "a\tb";
System.out.println(input);
System.out.println(input.indexOf('\t'));
prints 1 because there is a TAB char at index 1.
System.out.println(input.indexOf("\\s+"));
prints -1 because there is no substring \\s+ in your input value.
In other words: if you want to use the powers of regular expressions, you can't use indexOf(). You would be rather looking towards String.match() for example. But of course - that gives a boolean result; not an index.
If you intend to find the index of the first whitespace, you have to iterate the chars manually, like:
for (int index = 0; index < input.length(); index++) {
if (Character.isWhitespace(input.charAt(index))) {
return index;
}
}
return -1;
Something of this sort might help? Though there are better ways to do this.
class Sample{
public static void main(String[] args) {
String s = "1110 001";
int index = -1;
for(int i = 0; i < s.length(); i++ ){
if(Character.isWhitespace(s.charAt(i))){
index = i;
break;
}
}
System.out.println("Required Index : " + index);
}
}
Well, to find with a regular expression, you'll need to use the regular expression classes.
Pattern pat = Pattern.compile("\\s");
Matcher m = pat.matcher(s);
if ( m.find() ) {
System.out.println( "Found \\s at " + m.start());
}
The find method of the Matcher class locates the pattern in the string for which the matcher was created. If it succeeds, the start() method gives you the index of the first character of the match.
Note that you can compile the pattern only once (even create a constant). You just have to create a Matcher for every string.

Java Get first character values for a string

I have inputs like
AS23456SDE
MFD324FR
I need to get First Character values like
AS, MFD
There should no first two or first 3 characters input can be changed. Need to get first characters before a number.
Thank you.
Edit : This is what I have tried.
public static String getPrefix(String serial) {
StringBuilder prefix = new StringBuilder();
for(char c : serial.toCharArray()){
if(Character.isDigit(c)){
break;
}
else{
prefix.append(c);
}
}
return prefix.toString();
}
Here is a nice one line solution. It uses a regex to match the first non numeric characters in the string, and then replaces the input string with this match.
public String getFirstLetters(String input) {
return new String("A" + input).replaceAll("^([^\\d]+)(.*)$", "$1")
.substring(1);
}
System.out.println(getFirstLetters("AS23456SDE"));
System.out.println(getFirstLetters("1AS123"));
Output:
AS
(empty)
A simple solution could be like this:
public static void main (String[]args) {
String str = "MFD324FR";
char[] characters = str.toCharArray();
for(char c : characters){
if(Character.isDigit(c))
break;
else
System.out.print(c);
}
}
Use the following function to get required output
public String getFirstChars(String str){
int zeroAscii = '0'; int nineAscii = '9';
String result = "";
for (int i=0; i< str.lenght(); i++){
int ascii = str.toCharArray()[i];
if(ascii >= zeroAscii && ascii <= nineAscii){
result = result + str.toCharArray()[i];
}else{
return result;
}
}
return str;
}
pass your string as argument
I think this can be done by a simple regex which matches digits and java's string split function. This Regex based approach will be more efficient than the methods using more complicated regexs.
Something as below will work
String inp = "ABC345.";
String beginningChars = inp.split("[\\d]+",2)[0];
System.out.println(beginningChars); // only if you want to print.
The regex I used "[\\d]+" is escaped for java already.
What it does?
It matches one or more digits (d). d matches digits of any language in unicode, (so it matches japanese and arabian numbers as well)
What does String beginningChars = inp.split("[\\d]+",2)[0] do?
It applies this regex and separates the string into string arrays where ever a match is found. The [0] at the end selects the first result from that array, since you wanted the starting chars.
What is the second parameter to .split(regex,int) which I supplied as 2?
This is the Limit parameter. This means that the regex will be applied on the string till 1 match is found. Once 1 match is found the string is not processed anymore.
From the Strings javadoc page:
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
This will be efficient if your string is huge.
Possible other regex if you want to split only on english numerals
"[0-9]+"
public static void main(String[] args) {
String testString = "MFD324FR";
int index = 0;
for (Character i : testString.toCharArray()) {
if (Character.isDigit(i))
break;
index++;
}
System.out.println(testString.substring(0, index));
}
this prints the first 'n' characters before it encounters a digit (i.e. integer).

String manipulation of function names

For this Kata, i am given random function names in the PEP8 format and i am to convert them to camelCase.
(input)get_speed == (output)getSpeed ....
(input)set_distance == (output)setDistance
I have a understanding on one way of doing this written in pseudo-code:
loop through the word,
if the letter is an underscore
then delete the underscore
then get the next letter and change to a uppercase
endIf
endLoop
return the resultant word
But im unsure the best way of doing this, would it be more efficient to create a char array and loop through the element and then when it comes to finding an underscore delete that element and get the next index and change to uppercase.
Or would it be better to use recursion:
function camelCase takes a string
if the length of the string is 0,
then return the string
endIf
if the character is a underscore
then change to nothing,
then find next character and change to uppercase
return the string taking away the character
endIf
finally return the function taking the first character away
Any thoughts please, looking for a good efficient way of handing this problem. Thanks :)
I would go with this:
divide given String by underscore to array
from second word until end take first letter and convert it to uppercase
join to one word
This will work in O(n) (go through all names 3 time). For first case, use this function:
str.split("_");
for uppercase use this:
String newName = substring(0, 1).toUpperCase() + stre.substring(1);
But make sure you check size of the string first...
Edited - added implementation
It would look like this:
public String camelCase(String str) {
if (str == null ||str.trim().length() == 0) return str;
String[] split = str.split("_");
String newStr = split[0];
for (int i = 1; i < split.length; i++) {
newStr += split[i].substring(0, 1).toUpperCase() + split[i].substring(1);
}
return newStr;
}
for inputs:
"test"
"test_me"
"test_me_twice"
it returns:
"test"
"testMe"
"testMeTwice"
It would be simpler to iterate over the string instead of recursing.
String pep8 = "do_it_again";
StringBuilder camelCase = new StringBuilder();
for(int i = 0, l = pep8.length(); i < l; ++i) {
if(pep8.charAt(i) == '_' && (i + 1) < l) {
camelCase.append(Character.toUpperCase(pep8.charAt(++i)));
} else {
camelCase.append(pep8.charAt(i));
}
}
System.out.println(camelCase.toString()); // prints doItAgain
The question you pose is whether to use an iterative or a recursive approach. For this case I'd go for the recursive approach because it's straightforward, easy to understand doesn't require much resources (only one array, no new stackframe etc), though that doesn't really matter for this example.
Recursion is good for divide-and-conquer problems, but I don't see that fitting the case well, although it's possible.
An iterative implementation of the algorithm you described could look like the following:
StringBuilder buf = new StringBuilder(input);
for(int i = 0; i < buf.length(); i++){
if(buf.charAt(i) == '_'){
buf.deleteCharAt(i);
if(i != buf.length()){ //check fo EOL
buf.setCharAt(i, Character.toUpperCase(buf.charAt(i)));
}
}
}
return buf.toString();
The check for the EOL is not part of the given algorithm and could be ommitted, if the input string never ends with '_'

Counting the vowels included between two consonants

I'm trying to find, from a sentence, the words that contains two vowels between two r using java. So I read in the sentence and then I have to find the words that match the criteria described above. For instance if I have a string such as: "roar soccer roster reader" the method matches should return true for the words "roar" and "roster"
This is the method I come up with, which is doing the job
public boolean matches(String singleWord)
{
// set count to -1. it will increase to 2 if a 'r' is found, it decreases for each vowel
int count = -1;
// loop through a single word
for (int i=0; i<singleWord.length(); i++){
// if a 'r' is found set the count to two
if(singleWord.charAt(i) == 'r'){
// when count it's 0 exit loop
if (count == 0)
return true;
count = 2;}
// if I find a vowel count decreases
else if(isVowel(singleWord.charAt(i))){
count--;}
}
return false;
}
but it seems a bit clumsy... any suggestion on how to improve it or make it simpler? thanx!!!
just in case, this is the isVowel method
private boolean isVowel(char c)
{
String s = c + "";
return "aeiou".contains(s);
}
You can do this using a straightforward algorithm without loops:
Find the index of the first 'r'
Find the index of the last 'r'
Cut the substring in between the two
Return true if removing all vowels from the substring shortens it at least by two characters.
Here is how you can implement it:
boolean matches(String singleWord) {
int from = singleWord.indexOf('r');
int to = singleWord.lastIndexOf('r');
if (from < 0 || from == to) return false;
String sub = singleWord.substring(from+1, to);
return (sub.length() - sub.replaceAll("[aeiou]", "").length()) == 2;
}
Here is how it works step by step, using the word "roadster" as an example:
from = 0, to = 7
sub = "oadste"; length is 6
sub after replacement is "dst"; length is 3
The expression (6 - 3) == 2 is 3, not 2, so false is returned.
EDIT : The sequence must contain exactly two vowels, with no intervening 'r's.
This makes a problem slightly different, because the trick with the first and the last index no longer applies. However, a regex to match the desired sequence can be constructed relatively easily - here it is:
"r[^raeiou]*[aeiou][^raeiou]*[aeiou][^raeiou]*r"
In order to understand this regexp, all you need to know is that [...] matches any character inside brackets, [^...] matches any character except the ones in brackets, and * matches the preceding subexpression zero or more times.
The expression is lengthy, but it is composed of trivial pieces. It matches as follws:
An initial r
Zero or more non-vowels except r
The first vowel
Zero or more non-vowels except r
The second vowel
Zero or more non-vowels except r
The closing r
Here is a simple implementation:
boolean matches(String singleWord) {
return singleWord
.replaceAll("r[^raeiou]*[aeiou][^raeiou]*[aeiou][^raeiou]*r", "")
.length() != singleWord.length();
}
You can use a regular expression:
public static boolean matches(final String singleWord) {
return singleWord.matches(".*r([^aeiour]*[aeiou]){2}[^aeiour]*r.*");
}
Here is the test code:
for (String word: "roar soccer roster reader rarar".split(" "))
System.out.println(word+":"+matches(word));
And here is the output:
roar:true
soccer:false
roster:true
reader:false
rarar:false
You could also use a regular expression:
java.util.regex.Pattern.matches("\w*r\w*([aeiou]\w*){2}r\w*", "roar soccer roster reader");

Finding the longest substring between a "start" string and one of 3 possible "end" strings

So my question is substring-related.
How do you find the longest possible substring between a starting string and one of three ending strings? I also need to find the index of the original string that the largest substring starts at.
So:
Start string:
"ATG"
3 possible end strings:
"TAG"
"TAA"
"TGA"
An example original string might be:
"SDAFKJDAFKATGDFSDFAKJDNKSJFNSDTGASDFKJSDNKFJSNDJFATGDSDFKJNSDFTAGSDFSDATGFF"
So the result of that should give me:
- Longest substring length: 23 (from the substring ATGDFSDFAKJDNKSJFNSDTGA)
- Index of longest substring: 10
I cannot use Regex.
Thanks for any help!
This is arguably the easiest way, and it's just one line:
String target = str.replaceAll(".*ATG(.*)(TAG|TAA|TGA).*", "$1");
To find the index:
int index = str.indexOf("ATG") + 3;
Note: I have interpreted your remark "I cannot use regex" to mean "I am unskilled at regex", because if it's a java question, regex is available.
Well, this looks like a fun one.
It seems the most straightforward way to do this would be to build your own mini finite state machine. You would have to parse each character in the string and keep track of all possible character sequences that would terminate the sequence.
If you hit a 'T', you need to jump ahead and look at the next character. If it's an 'A' or a 'G' you need to jump ahead again, otherwise, add those tokens to your string. Continue the pattern until you get to the end of the original string, or match one of your terminal patterns.
So, maybe something that looks like this (simplified example):
String longestSequence(String original) {
StringBuilder sb = new StringBuilder();
char[] tokens = original.toCharArray();
for (int i = 0; i < tokens.length; ++i) {
// read each token, and compare / look ahead to see if you should keep going or terminate.
}
return sb.toString();
}
match your string to this regex:
ATG[A-Z]+(TAG|TAA|TGA)
if multiple match occurs then iterate and keep the one with highest length.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
// using pattern with flags
Pattern pattern = Pattern.compile("ATG[A-Z]+(TAG|TAA|TGA)");
Matcher matcher = pattern.matcher( yourInputStringHere );
while (matcher.find()) {
System.out.println("Found the text \"" + matcher.group()
+ "\" starting at " + matcher.start()
+ " and ending at index " + matcher.end());
}
There are already some beautiful and elegant solutions to your problem (Bohemian and inquisitive). If you still - as originally stated - can't use regex, here's an alternative. This code is not especially elegant, and as pointed, there are better ways to do it, but it should at least clearly show you the logic behind the solution to your problem.
How do you find the longest possible substring between a starting string
and one of three ending strings?
First, find the index of starting string, then find the index of each ending string, and get substrings for each ending, then their length. Remember that if string is not found, its index will be -1.
String originalString = "SDAFKJDAFKATGDFSDFAKJDNKSJFNSDTGASDFKJSDNKFJSNDJFATGDSDFKJNSDFTAGSDFSDATGFF";
String STARTING_STRING = "ATG";
String END1 = "TAG";
String END2 = "TAA";
String END3 = "TGA";
//let's find the index of STARTING_STRING
int posOfStartingString = originalString.indexOf(STARTING_STRING);
//if found
if (posOfStartingString != -1) {
int tagPos[] = new int[3];
//let's find the index of each ending strings in the original string
tagPos[0] = originalString.indexOf(END1, posOfStartingString+3);
tagPos[1] = originalString.indexOf(END2, posOfStartingString+3);
tagPos[2] = originalString.indexOf(END3, posOfStartingString+3);
int lengths[] = new int[3];
//we can now use the following methods:
//public String substring(int beginIndex, int endIndex)
//where beginIndex is our posOfStartingString
//and endIndex is position of each ending string (if found)
//
//and finally, String.length() to get the length of each substring
if (tagPos[0] != -1) {
lengths[0] = originalString.substring(posOfStartingString, tagPos[0]).length();
}
if (tagPos[1] != -1) {
lengths[1] = originalString.substring(posOfStartingString, tagPos[1]).length();
}
if (tagPos[2] != -1) {
lengths[2] = originalString.substring(posOfStartingString, tagPos[2]).length();
}
} else {
//no starting string in original string
}
lengths[] table now contains length of strings starting with STARTING_STRING and 3 respective endings. Then just find which one is the longest and you will have your answer.
I also need to find the index of the original string that the largest substring starts at.
This will be the index of where starting string starts, in this case 10.

Categories

Resources