Common Substring of two strings

Common Substring of two strings - java

This particular interview-question stumped me:
Given two Strings S1 and S2. Find the longest Substring which is a Prefix of S1 and suffix of S2.
Through Google, I came across the following solution, but didnt quite understand what it was doing.
public String findLongestSubstring(String s1, String s2) {
List<Integer> occurs = new ArrayList<>();
for (int i = 0; i < s1.length(); i++) {
if (s1.charAt(i) == s2.charAt(s2.length()-1)) {
occurs.add(i);
}
}
Collections.reverse(occurs);
for(int index : occurs) {
boolean equals = true;
for(int i = index; i >= 0; i--) {
if (s1.charAt(index-i) != s2.charAt(s2.length() - i - 1)) {
equals = false;
break;
}
}
if(equals) {
return s1.substring(0,index+1);
}
}
return null;
}
My questions:
How does this solution work?
And how do you get to discovering this solution?
Is there a more intuitive / easier solution?

Part 2 of your question
Here is a shorter variant:
public String findLongestPrefixSuffix(String s1, String s2) {
for( int i = Math.min(s1.length(), s2.length()); ; i--) {
if(s2.endsWith(s1.substring(0, i))) {
return s1.substring(0, i);
}
}
}
I am using Math.min to find the length of the shortest String, as I don't need to and cannot compare more than that.
someString.substring(x,y) returns you the String you get when reading someString beginning from character x and stopping at character y. I go backwards from the biggest possible substring (s1 or s2) to the smallest possible substring, the empty string. This way the first time my condition is true it will be biggest possible substring the fulfills it.
If you prefer you can go the other way round, but you have to introduce a variable saving the length of the longest found substring fulfilling the condition so far:
public static String findLongestPrefixSuffix(String s1, String s2) {
if (s1.equals(s2)) { // this part is optional and will
return s1; // speed things up if s1 is equal to s2
} //
int max = 0;
for (int i = 0; i < Math.min(s1.length(), s2.length()); i++) {
if (s2.endsWith(s1.substring(0, i))) {
max = i;
}
}
return s1.substring(0, max);
}
For the record: You could start with i = 1 in the latter example for a tiny bit of extra performance. On top of this you can use i to specify how long the suffix has at least to be you want to get. ;) If you writ Math.min(s1.length(), s2.length()) - x you can use x to specify how long the found substring may be at most. Both of these things are possible with the first solution, too, but the min length is a bit more involving. ;)
Part 1 of your question
In the part above the Collections.reverse the author of the code searches for all positions in s1 where the last letter of s2 is and saves this position.
What follows is essentially what my algorithm does, the difference is, that he doesn't check every substring but only those that end with the last letter of s2.
This is some sort of optimization to speed things up. If speed is not that important my naive implementation should suffice. ;)

Where did you find that solution? Was it written by a credible, well-respected coder? If you're not sure of that, then it might not be worth reading it. One could write really complex and inefficient code to accomplish something really simple, and it will not be worth understanding the algorithm.
Rather than trying to understand somebody else's solution, it might be easier to come up with it on your own. I think you understand the problem much better that way, and the logic becomes your own. Over time and practice the thought process will start to come more naturally. Practice makes perfect.
Anyway, I put a more simple implementation in Python here (spoiler alert!). I suggest you first figure out the solution on your own, and compare it to mine later.

Apache commons lang3, StringUtils.getCommonPrefix()
Java is really bad in providing useful stuff via stdlib. On the plus side there's almost always some reasonable tool from Apache.

I converted the #TheMorph's answer to javascript. Hope this helps js developer
if (typeof String.prototype.endsWith !== 'function') {
String.prototype.endsWith = function(suffix) {
return this.indexOf(suffix, this.length - suffix.length) !== -1;
};
}
function findLongestPrefixSuffix(s2, s1) {
for( var i = Math.min(s1.length, s2.length); ; i--) {
if(s2.endsWith(s1.substring(0, i))) {
return s1.substring(0, i);
}
}
}
console.log(findLongestPrefixSuffix('abc', 'bcd')); // result: 'bc'

Related

Find all valid words when given a string of characters (Recursion / Binary Search)

I'd like some feedback on a method I tried to implement that isn't working 100%. I'm making an Android app for practice where the user is given 20 random letters. The user then uses these letters to make a word of whatever size. It then checks a dictionary to see if it is a valid English word.
The part that's giving me trouble is with showing a "hint". If the user is stuck, I want to display the possible words that can be made. I initially thought recursion. However, with 20 letters this can take quite a long time to execute. So, I also implemented a binary search to check if the current recursion path is a a prefix to anything in the dictionary. I do get valid hints to be output however it's not returning all possible words. Do I have a mistake here in my recursion thinking? Also, is there a recommended, faster algorithm? I've seen a method in which you check each word in a dictionary and see if the characters can make each word. However, I'd like to know how effective my method is vs. that one.
private static void getAllWords(String letterPool, String currWord) {
//Add to possibleWords when valid word
if (letterPool.equals("")) {
//System.out.println("");
} else if(currWord.equals("")){
for (int i = 0; i < letterPool.length(); i++) {
String curr = letterPool.substring(i, i+1);
String newLetterPool = (letterPool.substring(0, i) + letterPool.substring(i+1));
if(dict.contains(curr)){
possibleWords.add(curr);
}
boolean prefixInDic = binarySearch(curr);
if( !prefixInDic ){
break;
} else {
getAllWords(newLetterPool, curr);
}
}
} else {
//Every time we add a letter to currWord, delete from letterPool
//Attach new letter to curr and then check if in dict
for(int i=0; i<letterPool.length(); i++){
String curr = currWord + letterPool.substring(i, i+1);
String newLetterPool = (letterPool.substring(0, i) + letterPool.substring(i+1));
if(dict.contains(curr)) {
possibleWords.add(curr);
}
boolean prefixInDic = binarySearch(curr);
if( !prefixInDic ){
break;
} else {
getAllWords(newLetterPool, curr);
}
}
}
private static boolean binarySearch(String word){
int max = dict.size() - 1;
int min = 0;
int currIndex = 0;
boolean result = false;
while(min <= max) {
currIndex = (min + max) / 2;
if (dict.get(currIndex).startsWith(word)) {
result = true;
break;
} else if (dict.get(currIndex).compareTo(word) < 0) {
min = currIndex + 1;
} else if(dict.get(currIndex).compareTo(word) > 0){
max = currIndex - 1;
} else {
result = true;
break;
}
}
return result;
}

The simplest way to speed up your algorithm is probably to use a Trie (a prefix tree)
Trie data structures offer two relevant methods. isWord(String) and isPrefix(String), both of which take O(n) comparisons to determine whether a word or prefix exist in a dictionary (where n is the number of letters in the argument). This is really fast because it doesn't matter how large your dictionary is.
For comparison, your method for checking if a prefix exists in your dictionary using binary search is O(n*log(m)) where n is the number of letters in the string and m is the number of words in the dictionary.
I coded up a similar algorithm to yours using a Trie and compared it to the code you posted (with minor modifications) in a very informal benchmark.
With 20-char input, the Trie took 9ms. The original code didn't complete in reasonable time so I had to kill it.
Edit:
As to why your code doesn't return all hints, you don't want to break if the prefix is not in your dict. You should continue to check the next prefix instead.

Is there a recommended, faster algorithm?
See Wikipedia article on "String searching algorithm", in particular the section named "Algorithms using a finite set of patterns", where "finite set of patterns" is your dictionary.
The Aho–Corasick algorithm listed first might be a good choice.

Java - Complex Recursive Backtracking

For Java practice I started working on a method countBinary that accepts an integer n as a parameter that prints all binary numbers that have n digits in ascending order, printing each value on a separate line. Assuming n is non-negative and greater than 0, some example outputs would look like this.
I am getting pretty much nowhere with this. I am able to write a program that finds all possible letter combinations of a String and similar things, but I have been unable to make almost any progress with this specific problem using binary and integers.
Apparently the best way to go about this issue is by defining a helper method that accepts different parameters than the original method and by building up a set of characters as a String for eventual printing.
Important Note: I am NOT supposed to use for loops at all for this exercise.
Edit - Important Note: I need to have trailing 0's so that all outputs are the same length.
So far this is what I have:
public void countBinary(int n)
{
String s = "01";
countBinary(s, "", n);
}
private static void countBinary(String s, String chosen, int length)
{
if (s.length() == 0)
{
System.out.println(chosen);
}
else
{
char c = s.charAt(0);
s = s.substring(1);
chosen += c;
countBinary(s, chosen, length);
if (chosen.length() == length)
{
chosen = chosen.substring(0, chosen.length() - 1);
}
countBinary(s, chosen, length);
s = c + s;
}
}
When I run my code my output looks like this.
Can anyone explain to me why my method is not running the way I expect it to, and if possible show me a solution to my issue so that I might get the correct output? Thank you!

There are more efficient ways to do it, but this will give you a start:
public class BinaryPrinter {
static void printAllBinary(String s, int n) {
if (n == 0) System.out.println(s);
else {
printAllBinary(s + '0', n - 1);
printAllBinary(s + '1', n - 1);
}
}
public static void main(String [] args) {
printAllBinary("", 4);
}
}
I'll let you work out the more efficient way.

Is my algorithm correct? (string concatenation)

Given a set S of m strings and a target string t we want to check whether or not t is the concatenation of some of the m strings while allowing repetition.
Example : S= {ab, dbe, eaa, ea} and t=eaabdbeab. Here the answer is YES, t=ea ab dbe ab.
Algorithm :
boolean IsConcatenation{S, t, i} {
int a=S[i].length;
int b=t.length-1;
String str=t.charAt(1)+t.charAt(2)+...+t.charAt(a-1);`
if (S[i]==str) { `
if (i<m) //m=S.length
boolean A= IsConcatenation(S,t, i++);
t=t.charAt(a)+t.charAt(a+1)+...+t.charAt(b);
boolean B= IsConcatenation(S,t, 0);
}
if (i==m+1)
return false;
if (t.length==0}
return true;
return IsConcatenation(S,t, i++);
}
This is my pseudo code. I would be very grateful if you could tell me whether my algorithm is correct or no.
Thank you.

The idea looks correct but slow.
Some of the implementation details (for example, using i++ instead of i + 1, and also possibly some off-by-one errors) are wrong, and I believe this is pseudocode, not code in any existing language.
To hint on a faster solution, consider solving the following set of subproblems: can a prefix of T of length k be represented as a concatenation of some strings from S? These can be solved with a dynamic programming approach for all k from 1 up to the total length of T. The largest of these subproblems is the actual problem we want to solve.

Guava's CharMatcher removeFrom

There are two questions here actually. The first one:
1) Is Java smart enough not to copy one array element into itself?
What I mean by that is :
int i = 1;
char [] chars=... //some chars
char[1] = char[i]; // first element into itself
2) Are the benchmarking's that guava made available to the public?
And what I mean by that is: I was looking into the source code of CharMatcher removeFrom method and saw this :
// This unusual loop comes from extensive benchmarking
OUT: while (true) {
pos++;
while (true) {
if (pos == chars.length) {
break OUT;
}
if (isLetter(chars[pos])) {
break;
}
chars[pos - spread] = chars[pos];
pos++;
}
spread++;
}
return new String(chars, 0, pos - spread);
I really liked the idea, but coded my own method:
public static String removeMine(String input){
char [] chars = input.toCharArray();
int howManyLetters = 0;
for(int i=0;i<chars.length;++i){
if(isLetter(chars[i])) {
chars[howManyLetters++] = chars[i];
}
else {
if(i == (chars.length - 1)) break;
chars[i] = chars[i+1];
}
}
return new String(chars, 0, howManyLetters);
}
I then added some benchmarks (I will put them on github if needed), here are the results:
https://microbenchmarks.appspot.com/runs/61e76bdc-b0d6-4145-8b8b-c1683287f038#r:scenario.benchmarkSpec.parameters.input,scenario.benchmarkSpec.methodName
I have (serious?) doubts that creators of guava did not have a version like that and I assume there was a strong reason to drop it (the comment is more then obvious). What I would like to see is either the actual benchmarks that they made or some serious reason to have such a method. I assume it has to do with the type of JVM that you run your code into, but actual proof would be appreciated.
P.S. I am still to test this with jmh also, will provide results once I'm done.

Javabat substring counting

public boolean catDog(String str)
{
int count = 0;
for (int i = 0; i < str.length(); i++)
{
String sub = str.substring(i, i+1);
if (sub.equals("cat") && sub.equals("dog"))
count++;
}
return count == 0;
}
There's my code for catDog, have been working on it for a while and just cannot find out what's wrong. Help would be much appreciated!*/
EDIT- I want to Return true if the string "cat" and "dog" appear the same number of times in the given string.

One problem is that this will never be true:
if (sub.equals("cat") && sub.equals("dog"))
&& means and. || means or.
However, another problem is that your code looks like your are flailing around randomly trying to get it to work. Everyone does this to some extent in their first programming class, but it's a bad habit. Try to come up with a clear mental picture of how to solve the problem before you write any code, then write the code, then verify that the code actually does what you think it should do and that your initial solution was correct.
EDIT: What I said goes double now that you've clarified what your function is supposed to do. Your approach to solving the problem is not correct, so you need to rethink how to solve the problem, not futz with the implementation.

Here's a critique since I don't believe in giving code for homework. But you have at least tried which is better than most of the clowns posting homework here.
you need two variables, one for storing cat occurrences, one for dog, or a way of telling the difference.
your substring isn't getting enough characters.
a string can never be both cat and dog, you need to check them independently and update the right count.
your return statement should return true if catcount is equal to dogcount, although your version would work if you stored the differences between cats and dogs.
Other than those, I'd be using string searches rather than checking every position but that may be your next assignment. The method you've chosen is perfectly adequate for CS101-type homework.
It should be reasonably easy to get yours working if you address the points I gave above. One thing you may want to try is inserting debugging statements at important places in your code such as:
System.out.println(
"i = " + Integer.toString (i) +
", sub = ["+sub+"]" +
", count = " + Integer.toString(count));
immediately before the closing brace of the for loop. This is invaluable in figuring out what your code is doing wrong.
Here's my ROT13 version if you run into too much trouble and want something to compare it to, but please don't use it without getting yours working first. That doesn't help you in the long run. And, it's almost certain that your educators are tracking StackOverflow to detect plagiarism anyway, so it wouldn't even help you in the short term.
Not that I really care, the more dumb coders in the employment pool, the better it is for me :-)
choyvp obbyrna pngQbt(Fgevat fge) {
vag qvssrerapr = 0;
sbe (vag v = 0; v < fge.yratgu() - 2; v++) {
Fgevat fho = fge.fhofgevat(v, v+3);
vs (fho.rdhnyf("png")) {
qvssrerapr++;
} ryfr {
vs (fho.rdhnyf("qbt")) {
qvssrerapr--;
}
}
}
erghea qvssrerapr == 0;
}

Another thing to note here is that substring in Java's built-in String class is exclusive on the upper bound.
That is, for String str = "abcdefg", str.substring( 0, 2 ) retrieves "ab" rather than "abc." To match 3 characters, you need to get the substring from i to i+3.

My code for do this:
public boolean catDog(String str) {
if ((new StringTokenizer(str, "cat")).countTokens() ==
(new StringTokenizer(str, "dog")).countTokens()) {
return true;
}
return false;
}
Hope this will help you
EDIT: Sorry this code will not work since you can have 2 tokens side by side in your string. Best if you use countMatches from StringUtils Apache commons library.

String sub = str.substring(i, i+1);
The above line is only getting a 2-character substring so instead of getting "cat" you'll get "ca" and it will never match. Fix this by changing 'i+1' to 'i+2'.
Edit: Now that you've clarified your question in the comments: You should have two counter variables, one to count the 'dog's and one to count the 'cat's. Then at the end return true if count_cats == count_dogs.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Common Substring of two strings - java

Apache commons lang3, StringUtils.getCommonPrefix() Java is really bad in providing useful stuff via stdlib. On the plus side there's almost always some reasonable tool from Apache.

Related

Find all valid words when given a string of characters (Recursion / Binary Search)

Java - Complex Recursive Backtracking

Is my algorithm correct? (string concatenation)

Guava's CharMatcher removeFrom

Javabat substring counting

Categories

Resources