Longest Common Subsequence Algorithm Explanation

Longest Common Subsequence Algorithm Explanation - java

So the psuedocode for longest common subsequence problem is listed as below.
longest-common-subsequence(s1, s2):
If the strings begin with the same letter c, the result to return is c
plus the longest common subsequence between the rest of s1 and s2
(that is, s1 and s2 without their first letter). For example, the
longest subsequence between "hollow" and "hello" is an "h" plus the
longest subsequence found between "ollow" and "ello".
Otherwise, if
the strings do not begin with the same letter, return the longer of
the following two: The longest common subsequence between s1 and the
rest of s2 (s2 without its first letter), The longest common
subsequence between the rest of s1 (s1 without its first letter) and
s2. For example, longest-common-subsequence("ollow", "ello") is the
longer of longest-common-subsequence("ollow", "llo") and
longest-common-subsequence("llow", "ello").
The part that I don't get is when the strings do not begin with the same letter, why do we take (s1 without the first letter, s2), (s1, s2 without the first letter). Why do we recursively go through these steps when they do not match? Is it just a set algorithm that is hard to understand? What is the reasoning behind this?

While #yash mahajan already covered everything, I'll just provide another way to think about it.
Go through two strings, assume you are at position i on string A (of length m) and position j on string B (of length n).
1. If the current two characters of both string are the same:
longest common subsequence till now = longest common subsequence between substring A[0...i-1] and substring B[0...j-1] + 1.
2. If two characters are different:
longest common subsequence = Max(longest common subsequence between substring A[0...i-1] and string B, longest common subsequence between string A and substring B[0...j-1])
You will have a clearer idea if you read the codes.
public class Solution {
public int longestCommonSubsequence(String A, String B) {
if(A == null || B == null || A.length() == 0 || B.length() == 0) {
return 0;
}
int m = A.length();
int n = B.length();
int[][] commonSubsequenceLength = new int[m + 1][n + 1];
for(int i = 1; i <= m; i++) {
for(int j = 1; j <= n; j++) {
if(A.charAt(i - 1) == B.charAt(j - 1)) {
commonSubsequenceLength[i][j] = commonSubsequenceLength[i - 1][j - 1] + 1;
} else {
commonSubsequenceLength[i][j] = Math.max(commonSubsequenceLength[i][j - 1], commonSubsequenceLength[i - 1][j]);
}
}
}
return commonSubsequenceLength[m][n];
}
}

When we know that the first characters of both the strings don't match, it's clear we can't include the characters in our longest subsequence as we do it in the first case. So the obvious choice we are left with is to neglect both of these characters and search for the rest of two strings for our subsequence. But if you consider this example : "hello" and "ello" , you can clearly see that if we neglect the first characters we are basically neglecting first character of our subsequence ("Ello"). So we go for two cases:
1.We remove first character of first string and search in the second string.
2.We remove first character of second string and search in first string.
And then we take maximum of those two.

To answer your question:
The part that I don't get is when the strings do not begin with the
same letter, why do we take (s1 without the first letter, s2), (s1, s2
without the first letter). Why do we recursively go through these
steps when they do not match? Is it just a set algorithm that is hard
to understand? What is the reasoning behind this?
What is confusing for you is the recursive call, I guess. The whole idea is to reduce the problem to a smaller input set. In this case 1 character less at a time.
You have 2 cases for the selected character(1st or last)
There is a match (example "h" in "hollow", "hello"), simply reduce the input size by 1 character in both strings and recursively call the same function.
No match. You have 2 options here - You may consider the first string has extra unwanted character or the 2nd one. Hence, make recursive call for both the cases and choose maximum of them.
Extra details:
This problem has properties of a typical Dynamic Programming (DP) Problem.
1) Optimal Substructure
2) Overlapping Sub-problems
Hope it helps!

Related

Java newbie: cutting a string off?

I'm new to programming (taking a class) and I'm not sure how to accomplish this one task.
"Ignoring case, find the last occurrence of an ‘a’ in the input and remove all of the characters following it. In the case where there are no ‘a’s in the word, remove all but the first two characters (reminder: do not use if statements or loops). At the end of the now truncated word, add a number that is the percentage that the length of the truncated word is of the length of the original word; this percentage should be rounded to the closest integer value."
I'll be fine with the percentage part, but I'm not sure how to do the first part.
How do I remove only after the last occurrence of 'a'?
If there is no 'a' how do I cut it off after the first two letters without using an if statement?
I'm assuming its to be done using string manipulation and various substrings, but I'm not sure how the criteria for the substrings should be made.
Remember, Java newbie! I don't know a lot of fancy coding techniques yet.
Thank you!

String#toLowerCase - remove all case from the String
String#lastIndexOf will tell you where the last occurrence of the specified String occurs, will return -1 if there is no occurrence, this is important.
String#subString will allow you to generate a new String based on a sub element of the current String
Math#max, Math#min

Given String input, consider the following as a possible starting point:
int indexOfSmallA = input.lastIndexOf('a');
int indexOfBigA = input.lastIndexOf('A');
int beginIndex = Math.max(indexOfSmallA, indexOfBigA);
// if not found, begin at 2 or end of input, else begin after last 'a'
beginIndex = (beginIndex == -1) ? Math.min(2, input.length()) : beginIndex + 1;
String result = input.substring(beginIndex);

For finding the last occurence of 'a' or 'A' you can use...
int index = Math.max(str.lastIndexOf('a'),str.lastIndexOf('A'));
index = (index==-1)?Math.min(2,str.length()):index+1;
Once you get the index you can use the following to remove the characters after it...
str.substring(0,index);

Optimal Solution for the following code to reduce running time

The following code is used to take a string input and find if any of its permutation are palindrome.
The code is taking o(n) time. I was hoping for a better solution.
public class Solution {
public static void main(String[] args) {
Scanner myScan = new Scanner(System.in);
String str = myScan.nextLine();
String a = "NO";
permutation("", str);
System.out.println(a);
// Assign ans a value of YES or NO, depending on whether or not inputString satisfies the required condition
myScan.close();
}
private static void permutation(String prefix, String str) {
int n = str.length();
String an = "NO";
if (n == 0) {
System.out.println(prefix);
StringBuffer sb = new StringBuffer(prefix).reverse();
String str1 = sb.toString();
if (prefix.equals(str1)) {
an = "YES";
System.out.println(an);
System.exit(0);
}
} else {
for (int i = 0; i < n; i++) {
permutation(prefix + str.charAt(i),
str.substring(0, i) + str.substring(i + 1));
}
}
}
}

(This looks a bit like homework, or a Java beginner's private excercise, so I'd prefer not to give you full code but just the idea, or algorithm, so you can come up with the actual implementation yourself.)
There is no need to enumerate all the permutations and see whether one of them is a palindrome. All you need to do is to count all the letters in the word and see if there is at most one letter that has an odd number of occurences. Take for example the palindrome racecar. It can be seen as having three parts: rac e car. The letters in the first and third part are the same, so each of those letters has to have an even count. The second part has just one kind of letter, but it can be repeated any number of times.
So, the basic algorithm is like this:
create a dictionary, map, for counting the letters, e.g. HashMap<String, Integer> in Java
for each individual character in the word, increase its count in the map by one
create a counter for odd-numbered letters, e.g. int odd_letters
for each character in the map, check whether its count is odd, and if so, increase the odd_letters counter by one
if the odd_letters counter is smaller or equal to one, return true, otherwise return false
If you also need to know the actual palindromic permutation, if there is any, you can easily construct one from the counts map.
Let's say our word is racecar. Counts are {a: 2, c: 2, e: 1, r: 2}
for each even-numbered letter, concatenate half of those letters' number, in any order, e.g. acr
add the odd-numbered letter in the middle, if any, as often as it was counted: acr e
finally, add the first part again, in reverse order: acr e rca
(Of course, racecar in itself already is a palindrome, but that does not matter; it's just easier to find an actual palindromic word than a word with a palindromic permutation.)
Finally, note that the complexity of your code is not O(n) (assuming that n is the length of the string). You are generating all the permutations, so this alone has to be at least O(n!), as there are n! permutations.

Find Min Length Substring Containing All Given Strings

Given a large document and a short pattern consisting of a few words
(eg. W1 W2 W3), find the shortest string that has all the words in any
order (for eg. W2 foo bar dog W1 cat W3 -- is a valid pattern)
I structured the "large document" as a list of strings. I believe my solution is O(nlog(n)), but I'm not sure (I'm also not sure whether it's correct). Is there a faster way? Please note that the below is pseudocoded Java, so obviously will not compile, but I believe the message is clear:
main(){
List<String> wordsToCheckFor;
List<String> allWords;
int allWordsLength = allWords.length;
int minStringLength = POS_INFINITY;
List<String> minString;
//The idea here is to divide and conquer the string; I will first
//check the entire string, then the entire string minus the first
//word, then the entire string minus the first two words, and so on...
for(int x = 0; x < allWordsLength; x++){
if(checkString(allWords, wordsToCheckFor) && (allWords.length < minStringLength)){
minString = allWords;
minStringLength = allWords.length();
}
allWords.remove(0);
}
System.out.println(minString);
}
checkString(List<String> allWords, List<String> wordsToCheckFor){
boolean good = true;
foreach(String word : wordsToCheckFor){
if(!allWords.contains(word))
good = false;
}
return good;
}

Your solution has O(n ^ 2) time complexity(in the worst case, each suffix is checked, and each check is O(n), because List.contains method has linear time complexity). Moreover, it is not correct: the answer is not always a suffix, it can be any substring.
A more efficient solution: iterate over your text word by word and keep track of the last occurrence of each word from the pattern(using, for example, a hash table). Update the answer after each iteration(a candidate substring is the one from the minimum last occurence among all words in the pattern to the current position). This solution has linear time complexity(under assumption that the number of words in the pattern is a constant).

RegEx to match strings that have only one C

I am looking for some tips on how I can take a string like:
KIGABCCA TQABCCAXT
GABCCASZYU GZTTABCCA MHNBABCCA CLZGABCA ABCCALZH
ABCCADQRNS VIZABCCA GABCCAG
UEKABCCA KBTOABCCA GABCCAMFFJ HABCCAISOJ OFJJABCCA HPABCCA
WBXRABCCA
ABCCAKH
VABCCAJX WBDOABCCA ABCCAWM GCABCA QHRABCCA
ABCCAMDDD WPABCCAD OGABCCA
TVABCCA JGLABCA
IUABCCA
and to return any entire string with only one C in it.
PLEASE NOTE: I AM NOT LOOKING FOR A SOLUTION!
Just some pointers or a description of the sort of constructs I should be looking at.
I have been labouring over it for ages, and have come close to hurting someone because of this. It is a homework question and I'm not looking to cheat, just some guidance.
I have read extensively about Reg Ex and I understand them.
I'm not looking for a beginners guide.

You want to first put a word boundary at the start and end. Then match any character that isn't C or a word boundary 0 or more times, then a C, then again, any character that isn't a C or word boundary 0 or more times. So it'll match a C on it's own, or a C with any non-C characters either (or both) side of it.
The no-C or word boundary you could do in two ways... say "any character that isn't a C or word boundary" or you could say "I want A, B or anything from D-Z". Up to you.

Search for a pattern that has the following elements, in order:
The beginning of the string or any whitespace.
Zero or more non-whitespace non-C characters.
A "C"
Zero or more non-whitespace non-C characters.
The end of the string or any whitespace.

you can create a count function. then pass each string to it. just an example
String string = "KIGABCCA"
public static boolean countChar(String string, char ch){
int count =0;
for(int i = 0; i<string.length();i++){
if(string.charAt(i) == ch ){
count++;
}
}
if ( count == 1){
return true;
}else {
return false;
}
}

Method to find similar substrings from two strings

I'm using this piece of Java code to find similar strings:
if( str1.indexof(str2) >= 0 || str2.indexof(str1) >= 0 ) .......
but With str1 = "pizzabase" and str2 = "namedpizzaowl" it doesn't work.
how do I find the common substrings i.e. "pizza"?

Iterate over each letter in str1, checking for it's existence in str2. If it doesn't exist, move on to the next letter, if it does, increase the length of the substring in str1 that you check for in str2 to two characters, and repeat until no further matches are found or you have iterated through str1.
This will find all substrings shared, but is - like bubble sort - hardly optimal while a very basic example of how to solve a problem.
Something like this pseudo-ish example:
pos = 0
len = 1
matches = [];
while (pos < str1.length()) {
while (str2.indexOf(str1.substring(pos, len))) {
len++;
}
matches.push(str1.substring(pos, len - 1));
pos++;
len = 1;
}

If your algorithm says two strings are similar when they contain a common substring, then this algorithm will always return true; the empty string "" is trivially a substring of every string. Also it makes more sense to determine the degree of similarity between strings, and return a number rather than a boolean.
This is a good algorithm for determining string (or more generally, sequence) similarity: http://en.wikipedia.org/wiki/Levenshtein_distance.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Longest Common Subsequence Algorithm Explanation - java

Related

Java newbie: cutting a string off?

Optimal Solution for the following code to reduce running time

Find Min Length Substring Containing All Given Strings

RegEx to match strings that have only one C

Method to find similar substrings from two strings

Categories

Resources