Longest Most Common Substring Based on Whole-Word Phrases

Longest Most Common Substring Based on Whole-Word Phrases - java

I've been doing a lot of research around this topic and can't quite crack this one easily. There are a lot of valuable solutions I've come across online for solving this problem based on characters, but how would you solve this problem based on whole-word phrases to avoid the result returning a phrase that contains a partial word at the start or end of the phrase?
For example, given an Array of Strings, the output would be the most common whole-word phrase that is contained in most (not all) of the Strings within the Array.
This example below is the closest I've found so far but it only works about half of the time and includes partial word results which isn't quite what I'm after. I'm sure someone has solved this one before.
// function to find the stem (longest common
// substring) from the string array
public static String findstem(String arr[])
{
// Determine size of the array
int n = arr.length;
// Take first word from array as reference
String s = arr[0];
int len = s.length();
String res = "";
for (int i = 0; i < len; i++) {
for (int j = i + 1; j <= len; j++) {
// generating all possible substrings
// of our reference string arr[0] i.e s
String stem = s.substring(i, j);
int k = 1;
for (k = 1; k < n; k++)
// Check if the generated stem is
// common to all words
if (!arr[k].contains(stem))
break;
// If current substring is present in
// all strings and its length is greater
// than current result
if (k == n && res.length() < stem.length())
res = stem;
}
}
return res;
}
// Driver Code
public static void main(String args[])
{
String arr[] = { "grace", "graceful", "disgraceful",
"gracefully" };
String stems = findstem(arr);
System.out.println(stems);
}

Does this do what you intended. It simply checks to see if any word is a substring of itself and others.
If you want to check for real word substrings you would need to reference some dictionary which would be very time consuming.
String arr[] = { "grace", "graceful", "disgraceful",
"gracefully" };
String save = "";
int count = 0;
for (int i = 0; i < arr.length && count != arr.length; i++) {
count = 0;
for (int k = 0; k < arr.length; k++) {
if (arr[k].contains(arr[i])) {
count++;
save = arr[i];
}
}
}
System.out.println(save);

Related

There is string "naveen" , want output as "eennav"

I have string in java "naveen" i want output as "eennav". Please help me out in this. The idea is that the characters are to be ordered in order of frequency, and, where frequencies are the same, alphabetically.
Thanks
I have tried to find duplicates in the string , but am not able to get the required output .
String str="naveen";
int count =0;
char[] charr=str.toCharArray();
for(int i=0;i<charr.length;i++) {
//System.out.println(s[i]);
for(int j=i+1;j<charr.length;j++) {
if(charr[i]==(charr[j])) {
System.out.println(charr[i]);
}

Assuming the question is asking us to take a string containing only lower case letters and organize them by frequency of letters (high to low) and within that by alphabetical order, we can do it as below.
The strategy is first to make a pass through the characters of the input string and count the number of occurrences of each. Then we look through the counts for letters and find the largest. Working from the largest down to 1, we run through the alphabet for characters that occur that many times, and append that many of them to the result string.
There is a little bit of work involved here to convert back and forth from 'a' to 0 and 0 to 'a' and so on to 25 and 'z'.
This approach could be extended, but since the question didn't specify, I chose to make simplifying assumptions. I also did not work on optimizing, just getting it to work.
public class MyClass {
public static void main(String args[]) {
String str = "naveen"; //this string may change, but it is assumed to be all lower case letters by this implementation
int[] counts = new int[26]; //frequency count of letters a to z
for (int i = 0; i < 26; i++) {
counts[i] = 0; // intially 0 of any letter
}
char[] charr = str.toCharArray();
for (int i = 0; i < charr.length; i++) {
counts[(int)(charr[i]) - (int)('a')]++; // increment corresponding spot in counts array, spot 0 for 'a' through 25 for 'z'
}
int maxCount = counts[0]; // now find the most occurrences of any letter
for (int i = 1; i < counts.length; i++) {
if (counts[i] > maxCount) {
maxCount = counts[i];
}
}
String result = ""; // string to return
for (int j = maxCount; j > 0; j--) { // work down from most frequently occuring, within that alphabetically
for (int i = 0; i < 26; i++) {
if (counts[i] == j) {
//System.out.println("there are "+j+" of the letter "+(char)((int)('a'+i)));
for (int k = 0; k < j; k++) {
result = result + (char)((int)('a' + i));
}
};
}
}
System.out.println(result);
}
}

Big O of String repetition append in Java

I have a function to get a a string as a repetition of an original string. I'm wondering if I use StringBuilder append, what is the Big O of the function? Is it O(nl) : n is number of repeats and l is length of the original string ?
public String getRepetition(String originalStr, Integer n){
StringBuilder str = new StringBuilder();
for(int i = 0; i < n; i++)
str.append(originalStr);
return str.toString();
}
Comparing with the approach below, which one is better?
public String getRepetition(String originalStr, Integer n){
String str = "";
for(int i = 0; i < n; i++)
str += originalStr;
return originalStr;
}

I'm not sure why other three answers are all saying both pieces of code are O(n). Assuming originalStr is not "", the first is O(n) the other O(n^2)! (That's an exclamation, not a factorial.) They teach this on the first day of Java school. C programmers get "don't use strlen in the condition of that for loop"; Java programmers get this.
String str = "";
for(int i = 0; i < n; i++)
str += originalStr;
Each time around this loop str is getting longer. It's i * orginalStr.length(). Creating a new String (assuming no wild compiler optimisations) which takes time roughly proportional to i each time.
Edit: Usually we ignore the length of the original string. But yeah, of course it's going to be proprotional, so O(nstrlen(originalStr)) and O(nn*strlen(originalStr)). By convention this is dealt with separately.
If we rewrite the code without the String abstraction, perhaps it will be clearer.
public static char[] getRepetition(char[] originalStr, int n) {
char[] str = {};
for (int i = 0; i < n; ++i) {
assert str.length == i * originalStr.length;
char[] newStr = new char[str.length + originalStr.length];
for (int j=0; j<str.length; ++j) {
newStr[j] = str[j];
}
for (int j=0; j<originalStr.length; ++j) {
newStr[str.length+j] = originalStr[j];
}
str = newStr;
}
return str;
}
(As ever, I've not bothered to so much as compile the code. Not safe to use in a nuclear reactor.)
Just for giggles, let's deabstract the first implementation.
public static char[] getRepetition(char[] originalStr, int n) {
char[] str = new char[16];
int strLen = 0;
for (int i = 0; i < n; ++i) {
assert strLen == i * originalStr.length;
// ensureCapacity
if (str.length < strLen + originalStr.length) {
// The size at least doubles,
// so this happens increasing less often.
// It wont happen again for approximately
// the same number of iterations
// as have already happened!
char[] newStr = new char[Math.min(
str.length + originalStr.length, // first time safe
str.length*2 + 2 // *2 !
)];
for (int j=0; j<strLen; ++j) {
newStr[j] = str[j];
}
str = newStr;
}
// actual append
for (int j=0; j<originalStr.length; ++j) {
str[strLen++] = originalStr[j];
}
}
// toString
char[] newStr = new char[strLen];
for (int i=0; j<newStr.length; ++i) {
newStr[i] = str[j];
}
return newStr;
}

Both of your approaches are O(n) while the first approach eliminates several temporary String(s). It isn't clear why you have made n an Integer, nor why you have not made this a static method (it depends on no instance state). Additionally, in Java 8+, you could implement it with a lambda like
public static String getRepetition(String originalStr, int n) {
return Stream.generate(() -> originalStr).limit(n).collect(Collectors.joining());
}
Also, if you're going to use a StringBuilder as in your first example, you can explicitly size it to avoid having to amortize the cost of resizing the StringBuilder
StringBuilder str = new StringBuilder(originalStr.length() * n);

In both the cases the complexity is O(n) because you are iterating n times.
The only difference in second approach is you are creating new String in each iteration i.e. at str += originalStr;

I'm not able to sort the splited strings properly using comparedTo method. What can I do to correct it?

This is my code, but I know this is not right. I have written a lot of code for such a simple task.
Sample input is:
welcome
Sample output is:
com
elc
lco
ome
wel
It should print:
your first string is 'com'
and
your last string is 'wel'
Code:
import java.io.*;
import java.util.*;
public class Solution {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
String str = sc.nextLine();
int k = sc.nextInt();
int k1 = k;
int j = 0;
int t = str.length();
String [] s = new String [1000];
for (int i = t, a = 0; i >= k; i--, a++) {
s[a] = str.substring(j, k1);
j++;
k1++;
}
String[] s1 = new String[j];
for (int i = 0 ; i < j; i++) {
s1[i] = s[i];
}
for (int y = 0; y < j; y++) {
for (int z = y + 1; z < j; z++) {
if(s1[z].compareTo(s1[y]) < 0) {
String temp = s1[z];
s1[z] = s1[y];
s1[y] = temp;
}
}
}
System.out.println(s1[0]);
System.out.println(s1[1]);
}
}
Note: I split my strings, but I'm not able to arrange strings in alphabetical order, and feel that I have used a lot of arrays. Is there a better way to do this?

You can
reduce the number of variables,
use collections (list in this case) instead of Arrays to avoid having to set a size (1000)
Sort using the framework
Scanner sc = new Scanner(System.in);
String str = sc.nextLine();
int k = sc.nextInt();
List<String> cutStrings = new ArrayList<String>();
for (int i = 0; i < str.length() - k; i++) {
cutStrings.add(str.substring(i, i + k));
}
Collections.sort(cutStrings);
System.out.println(cutStrings.get(0));
System.out.println(cutStrings.get(cutStrings.size()-1));
}

You can easily sort your String[] array by simply using
Arrays.sort(s);
This will sort your strings in the default order. If you need any other kind of order you can pass the comparator as a second parameter.
You can get first and last by getting s[0] and s[s.length-1]

I did a quick implementation of your requirements. It might not be exactly what you're looking for but it should get you started. :)
So, I used an ArrayList to grab the substrings and the use the Collections library to do the sorting for me. This is just one of the many ways of solving the problem, btw. The input word can vary in size so I felt that a list would be appropriate for this situation.
String s = "welcome";
List<String> words = new ArrayList<String>();
for (int i = 0; i < s.length() - 2; i++) {
String chunk = s.charAt(i) + "" + s.charAt(i + 1) + ""
+ s.charAt(i + 2);
words.add(chunk);
System.out.println(chunk);
}
Collections.sort(words);
System.out.println(words.toString());
Feel free to let me know if you have any questions or if I have made a mistake in the code.
Good luck!

Actual problem of your code is splitting. Sorting will work. If j value 1 and k1 value 3 then wel substring is coming. Next loop, (after incrementation of both j and k1 by 1) j value 2 and k1 value 4 then elc substring is coming, etc.
So, instead of
String [] s = new String [1000];
for (int i = t, a = 0; i >= k; i--, a++) {
s[a] = str.substring(j, k1);
j++;
k1++;
}
use
int k = sc.nextInt();
String [] s = new String [(str.length()/3)+1] ;
for ( int i = 0,a = 0; i<(str.length()-k); i+=k,a++)
{
s[a] = str.substring(i,(i+k));
System.out.println(s[a]);
}
s[s.length-1]=str.substring((str.length()-k),str.length());//to add remaining values
Arrays.sort(s);//sorting alphabatically
for(int i = 0; i < s.length; i++)
System.out.println(s[i]);
}
i value will be incremented by 3. In the for loop (i+=k) where k=3.
Output:
amp
com
e s
ple
wel

Sorting two dimensional String array as per LastName without any Collection API

As I attended interview yesterday for an Organisation as I done every thing is best but I was unable to build logic for a Question called :
String [][]Names={{"John","Pepper"},
{"Smith","Adams"},
{"Katpiller","RhodSon"},
{"BillMark","pearson"}
};
As per the Last Name in the above array need to be sorted without any Collection.sort or CompareTo() or any other Collection API.
My Implementation is :
String str[]=new String[3];
for(int j=0; j<Name.length;j++)
{
for (int i=0 ; i<2; i++)
{
str[i]=Name[j][i];
}
for(int i=0;i<str.length;i++)
{
for(int k=i+1;k<str.length;k++)
{
if(str[i].compareTo(str[k])>0)
{
String temp= str[i];
str[i]=str[k];
str[k]=temp;
}
}
System.out.print(str[i]+ " ");
}
System.out.println();
}
Please Help me out this Hurdle Thanks in Advance

for(int j = Names.length - 2; j >= 0; j--)
{
String name = Names[j][0];
String lastName = Names[j][1];
int i = j + 1;
while(i < Names.length && strCompare(Names[i][1],lastName) < 0)
{
Names[i - 1][0] = Names[i][0];
Names[i - 1][1] = Names[i][1];
i = i + 1;
}
Names[i - 1][0] = name;
Names[i - 1][1] = lastName;
}
for(String[] name : Names)
System.out.print(Arrays.toString(name));
This is an implementation of Insertion Sort. Which sorts in place without the need of other using additional space and it works good on small inputs.
And my own reallyyyyyy bad implementation of a string comparation (String#compareTo) haha.. just to give you an idea on how to do it. (dont know if it works on all cases)
int strCompare(String str1, String str2){
//gets char arrays for both strings.
char[] strChar1 = str1.toUpperCase().toCharArray();
char[] strChar2 = str2.toUpperCase().toCharArray();
//gets max length of the strings
int maxLength = strChar1.length > strChar2.length? strChar1.length : strChar2.length;
//loops character by character to compare them.
for(int i = 0; i < maxLength; i++)
{
//if the length of the char array 1 is less than the index, then its smaller than str2.
if(strChar1.length <= i)
return -1;
//if the length of the char array 2 is less than the index, then its smaller than str1.
if(strChar2.length <= i)
return 1;
//compare characters and return -1 or 1 depending which one is larger.
if(strChar1[i] < strChar2[i]){
return -1;
} else if (strChar1[i] > strChar2[i]){
return 1;
}
}
//return 0 to say that they are equal if the loop doesnt return anything.
return 0;
}
Hope it helps.
Output : [Smith, Adams][John, Pepper][Katpiller, RhodSon][BillMark, pearson]

IMHO there are few questions to ask before trying to solve this problem :
should the sort be case sensitive (I suppose not)
should the sort be optimized for a great number of names (I suppose not)
should the array be sorted in place (I suppose yes)
If the answer to the second question is true, a correct algorithm must be implemented (quick sort, merge sort), and the lowercased name must be pre-calculated, if not a direct bubble sort is enough (O(n2/2) but simpler to implement)
int k;
String[] temp;
for (int i=0; i<len(Names) - 1; i++) {
k = i;
for (j=i + 1; i<len(Names); j++) {
if (Names[k][1].toLowerCase().compareTo(Names[i][1].toLowerCase()) < 0) {
k = j;
}
}
if (k != i) {
temp = Names[i];
Names[i] = Names[k];
Names[k] = temp;
}
}
If String.compareTo is not allowed, it simply has to be re-implemented and I suppose there are no accented chars in last name (é, è, ö, ü, etc.)
int compare(String a, String b) {
int n = (a.size() < b.size()) ? a.size(), b.size();
int delta;
for (int i =0; i<n, i++) {
delta = a.codePointAt(i) < b.codePointAt(j);
if (delta != 0) {
return delta;
}
}
return len(b) - len(a);
}
and the line if (Names[k][1]..compareTo(Names[i][1]...) { has to be rewritten as :
if (compare(Names[k][1].toLowerCase(), Names[i][1].toLowerCase()) < 0) {
This algorithm is really badly optimized, but it was simple to write (and would be simple to test). It is then possible to add as needed :
better sort algo (quick sort)
pre calculation of lowercase last name
management of accented characters.

Getting all combinations of values from many lists

I'm trying to resolve all the combinations of elements based on a given string.
The string is like this :
String result="1,2,3,###4,5,###6,###7,8,";
The number of element between ### (separated with ,) is not determined and the number of "list" (part separated with ###) is not determined either.
NB : I use number in this example but it can be String too.
And the expected result in this case is a string containing :
String result = "1467, 1468, 1567, 1568, 2467, 2468, 2567, 2568, 3467, 3468, 3567, 3568"
So as you can see the elements in result must start with an element of the first list then the second element must be an element of the second list etc...
From now I made this algorithm that works but it's slow :
String [] parts = result.split("###");
if(parts.length>1){
result="";
String stack="";
int i;
String [] elmts2=null;
String [] elmts = parts[0].split(",");
for(String elmt : elmts){ //Browse root elements
if(elmt.trim().isEmpty())continue;
/**
* This array is used to store the next index to use for each row.
*/
int [] elmtIdxInPart= new int[parts.length];
//Loop until the root element index change.
while(elmtIdxInPart[0]==0){
stack=elmt;
//Add to the stack an element of each row, chosen by index (elmtIdxInPart)
for(i=1 ; i<parts.length;i++){
if(parts[i].trim().isEmpty() || parts[i].trim().equals(","))continue;
String part = parts[i];
elmts2 = part.split(",");
stack+=elmts2[elmtIdxInPart[i]];
}
//rollback i to previous used index
i--;
if(elmts2 == null){
elmtIdxInPart[0]=elmtIdxInPart[0]+1;
}
//Check if all elements in the row have been used.
else if(elmtIdxInPart[i]+1 >=elmts2.length || elmts2[elmtIdxInPart[i]+1].isEmpty()){
//Make evolve previous row that still have unused index
int j=1;
while(elmtIdxInPart[i-j]+1 >=parts[i-j].split(",").length ||
parts[i-j].split(",")[elmtIdxInPart[i-j]+1].isEmpty()){
if(j+1>i)break;
j++;
}
int next = elmtIdxInPart[i-j]+1;
//Init the next row to 0.
for(int k = (i-j)+1 ; k <elmtIdxInPart.length ; k++){
elmtIdxInPart[k]=0;
}
elmtIdxInPart[i-j]=next;
}
else{
//Make evolve index in current row, init the next row to 0.
int next = elmtIdxInPart[i]+1;
for(int k = (i+1) ; k <elmtIdxInPart.length ; k++){
elmtIdxInPart[k]=0;
}
elmtIdxInPart[i]=next;
}
//Store full stack
result+=stack+",";
}
}
}
else{
result=parts[0];
}
I'm looking for a more performant algorithm if it's possible. I made it from scratch without thinking about any mathematical algorithm. So I think I made a tricky/slow algo and it can be improved.
Thanks for your suggestions and thanks for trying to understand what I've done :)
EDIT
Using Svinja proposition it divide execution time by 2:
StringBuilder res = new StringBuilder();
String input = "1,2,3,###4,5,###6,###7,8,";
String[] lists = input.split("###");
int N = lists.length;
int[] length = new int[N];
int[] indices = new int[N];
String[][] element = new String[N][];
for (int i = 0; i < N; i++){
element[i] = lists[i].split(",");
length[i] = element[i].length;
}
// solve
while (true)
{
// output current element
for (int i = 0; i < N; i++){
res.append(element[i][indices[i]]);
}
res.append(",");
// calculate next element
int ind = N - 1;
for (; ind >= 0; ind--)
if (indices[ind] < length[ind] - 1) break;
if (ind == -1) break;
indices[ind]++;
for (ind++; ind < N; ind++) indices[ind] = 0;
}
System.out.println(res);

This is my solution. It's in C# but you should be able to understand it (the important part is the "calculate next element" section):
static void Main(string[] args)
{
// parse the input, this can probably be done more efficiently
string input = "1,2,3,###4,5,###6,###7,8,";
string[] lists = input.Replace("###", "#").Split('#');
int N = lists.Length;
int[] length = new int[N];
int[] indices = new int[N];
for (int i = 0; i < N; i++)
length[i] = lists[i].Split(',').Length - 1;
string[][] element = new string[N][];
for (int i = 0; i < N; i++)
{
string[] list = lists[i].Split(',');
element[i] = new string[length[i]];
for (int j = 0; j < length[i]; j++)
element[i][j] = list[j];
}
// solve
while (true)
{
// output current element
for (int i = 0; i < N; i++) Console.Write(element[i][indices[i]]);
Console.WriteLine(" ");
// calculate next element
int ind = N - 1;
for (; ind >= 0; ind--)
if (indices[ind] < length[ind] - 1) break;
if (ind == -1) break;
indices[ind]++;
for (ind++; ind < N; ind++) indices[ind] = 0;
}
}
Seems kind of similar to your solution. Does this really have bad performance? Seems to me that this is clearly optimal, as the complexity is linear with the size of the output, which is always optimal.
edit: by "similar" I mean that you also seem to do the counting with indexes thing. Your code is too complicated for me to go into after work. :D
My index adjustment works very simply: starting from the right, find the first index we can increase without overflowing, increase it by one, and set all the indexes to its right (if any) to 0. It's basically counting in a number system where each digit is in a different base. Once we can't even increase the first index any more (which means we can't increase any, as we started checking from the right), we're done.

Here is a somewhat different approach:
static void Main(string[] args)
{
string input = "1,2,3,###4,5,###6,###7,8,";
string[] lists = input.Replace("###", "#").Split('#');
int N = lists.Length;
int[] length = new int[N];
string[][] element = new string[N][];
int outCount = 1;
// get each string for each position
for (int i = 0; i < N; i++)
{
string list = lists[i];
// fix the extra comma at the end
if (list.Substring(list.Length - 1, 1) == ",")
list = list.Substring(0, list.Length - 1);
string[] strings = list.Split(',');
element[i] = strings;
length[i] = strings.Length;
outCount *= length[i];
}
// prepare the output array
string[] outstr = new string[outCount];
// produce all of the individual output strings
string[] position = new string[N];
for (int j = 0; j < outCount; j++)
{
// working value of j:
int k = j;
for (int i = 0; i < N; i++)
{
int c = length[i];
int q = k / c;
int r = k - (q * c);
k = q;
position[i] = element[i][r];
}
// combine the chars
outstr[j] = string.Join("", position);
}
// join all of the strings together
//(note: joining them all at once is much faster than doing it
//incrementally, if a mass concatenate facility is available
string result = string.Join(", ", outstr);
Console.Write(result);
}
I am not a Java programmer either, so I adapted Svinja's c# answer to my algorithm, assuming that you can convert it to Java also. (thanks to Svinja..)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Longest Most Common Substring Based on Whole-Word Phrases - java

Related

There is string "naveen" , want output as "eennav"

Big O of String repetition append in Java

I'm not able to sort the splited strings properly using comparedTo method. What can I do to correct it?

Sorting two dimensional String array as per LastName without any Collection API

Getting all combinations of values from many lists

Categories

Resources