I know this problem is probably best served with DP, but I was wondering if it was possible to do it with recursion as a brute force way.
Given a set of words, say {"sales", "person", "salesperson"}, determine which words are compound (that is, it is the combination of 2 or more words in the list). So in this case, salesperson = sales + person, and is compound.
I based my answer heavily off of this problem: http://www.geeksforgeeks.org/dynamic-programming-set-32-word-break-problem/
public static void main(String args[]) throws Exception {
String[] test = { "salesperson", "sales", "person" };
String[] output = simpleWords(test);
for (int i = 0; i < output.length; i++)
System.out.println(output[i]);
}
static String[] simpleWords(String[] words) {
if (words == null || words.length == 0)
return null;
ArrayList<String> simpleWords = new ArrayList<String>();
for (int i = 0; i < words.length; i++) {
String word = words[i];
Boolean isCompoundWord = breakWords(words, word);
if (!isCompoundWord)
simpleWords.add(word);
}
String[] retVal = new String[simpleWords.size()];
for (int i = 0; i < simpleWords.size(); i++)
retVal[i] = simpleWords.get(i);
return retVal;
}
static boolean breakWords(String[] words, String word) {
int size = word.length();
if (size == 0 ) return true;
for (int j = 1; j <= size; j++) {
if (compareWords(words, word.substring(0, j)) && breakWords(words, word.substring(j, word.length()))) {
return true;
}
}
return false;
}
static boolean compareWords(String[] words, String word) {
for (int i = 0; i < words.length; i++) {
if (words[i].equals(word))
return true;
}
return false;
}
The problem here is now that while it successfully identifies salesperson as a compound word, it will also identify sales and person as a compound word. Can this code be revised so that this recursive solution works? I'm having trouble coming up with how I can easily do this.
Here is a solution with recursivity
public static String[] simpleWords(String[] data) {
List<String> list = new ArrayList<>();
for (String word : data) {
if (!isCompound(data, word)) {
list.add(word);
}
}
return list.toArray(new String[list.size()]);
}
public static boolean isCompound(String[] data, String word) {
return isCompound(data, word, 0);
}
public static boolean isCompound(String[] data, String word, int iteration) {
if (data == null || word == null || word.trim().isEmpty()) {
return false;
}
for (String str : data) {
if (str.equals(word) && iteration > 0) {
return true;
}
if (word.startsWith(str)) {
String subword = word.substring(str.length());
if (isCompound(data, subword, iteration + 1)) {
return true;
}
}
}
return false;
}
Just call it like this:
String[] data = {"sales", "person", "salesperson"};
System.out.println(Arrays.asList(simpleWords(data)));
Recently I took part in a java coding challenge in my college and was asked this problem which I found difficult to implement.
Problem was to implement a method detect that given two LinkedList, return the index where second list is sublist of first.
detect((1,2,3),(2,3)) should return 1
The node structure to the list was
LinkedListNode {
String val;
LinkedListNode *next;
}
and the method signature
static int detect(LinkedListNode list, LinkedListNode sublist)
What would be the basic algorithm to approach this problem. I am a newbie to data structures.
I believe Collections.indexOfSubList implements this. You can look at it's implementation.
Basically:
ListIterator<?> si = source.listIterator();
nextCand:
for (int candidate = 0; candidate <= maxCandidate; candidate++) {
ListIterator<?> ti = target.listIterator();
for (int i=0; i<targetSize; i++) {
if (!eq(ti.next(), si.next())) {
// Back up source iterator to next candidate
for (int j=0; j<i; j++)
si.previous();
continue nextCand;
}
}
return candidate;
}
The basic idea is to traverse the second list and for every index in this list check for equality in first list for consecutive elements. The following algorithm will work for you:
public static void main(String[] args) {
List<Integer> list1 = new LinkedList<Integer>();
list1.add(1);
list1.add(2);
list1.add(3);
list1.add(4);
list1.add(5);
list1.add(6);
List<Integer> list2 = new LinkedList<Integer>();
list2.add(2);
list2.add(3);
boolean contains = true;
int currIndex = 0;
int i = 0,j = 0;
for(;j<list2.size();j++) {
int e2 = list2.get(j);
for(i=currIndex;i<list1.size();i++) {
if(e2 == list1.get(i)) {
break;
}
}
if(i == list1.size()) {
contains = false;
break;
}
currIndex++;
if( contains && (currIndex == list2.size()) ) {
System.out.println("Index is: " + (i-j));
}
}
}
This prints Index is: 1 as expected.
static int detect(LinkedListNode list, LinkedListNode sublist) {
int counter = 0;
int index = -1;
LinkedListNode sub = sublist;
do {
if (list.val == sub.val) {
if (index == -1)
index = counter;
if (sub.next != null) {
sub = sub.next;
if (sub.next == null) {
return index;
}
}
} else {
index = -1;
sub = sublist;
}
list = list.next;
counter++;
} while (list.next != null);
return index;
}
Since, the value is String here :-
static int find(LinkedListNode list, LinkedListNode sublist) {
String listString = convertLinkedListToString(list);
String sublistString = convertLinkedListToString(sublist);
return listString.indexOf(sublistString);
}
private static String convertLinkedListToString(LinkedListNode list) {
String listAsString = "";
while(list != null) {
listAsString = listAsString + list.val;
list = list.next;
}
return listAsString;
}
(Full disclosure: this is for some homework I can't seem to figure out.)
The task: Identify duplicates in a list and add them to another ArrayList to be printed out.
Specifications: I am NOT allowed to use any collection other than an ArrayList, so I can't use something like a Set. It seems like every answer on StackOverflow recommends use of a Set, which is why I decided to ask this question.
What I've attempted so far:
public static void deleteDuplicates(List<String> list)
{
int pointer = 1;
List<String> duplicates = new ArrayList<String>();
for (int i = 0; i < list.size() - 1; i++) {
if (list.get(i).equals(list.get(pointer))) {
duplicates.add(list.get(i));
if (pointer == 1) {
duplicates.add(list.get(pointer));
} else if ((pointer + 1) == list.size() - 1) {
duplicates.add(list.get(pointer));
}
pointer++;
} else {
display(duplicates);
duplicates = new ArrayList<String>();
pointer++;
}
}
}
The test data:
List<String> duplicated = new ArrayList<String>();
duplicated.add("3");
duplicated.add("3");
duplicated.add("30");
duplicated.add("46");
duplicated.add("46");
What's not working: When the size of the list is an odd number, the duplicates report correctly. When the size of the list is an even number, only the first two duplicates are reported.
The problem with your approach was the loop exits before it do the if-else check for last element. On the last iteration the if condition satisfies and it adds to duplicates but it wont enter the for loop again to goto the else part. So it does'nt get dispalyed. Try
public static void deleteDuplicates(List<String> list)
{
int pointer = 1;
List<String> duplicates = new ArrayList<String>();
for (int i = 0; i < list.size() - 1; i++) {
if (list.get(i).equals(list.get(pointer))) {
duplicates.add(list.get(i));
if (pointer == 1) {
duplicates.add(list.get(pointer));
} else if ((pointer + 1) == list.size() - 1) {
duplicates.add(list.get(pointer));
}
pointer++;
} else if(duplicates.size() > 0) {
display(duplicates);
duplicates.clear();
pointer++;
}
}
if(duplicates.size() > 0){
display(duplicates);
}
}
Although Syam S answer is right but this will work for unsorted array too:
public static void deleteDuplicates(List<String> list)
{
List<String> duplicates = new ArrayList<String>();
for (int j = 0; j < list.size() - 2; j++) {
int pointer = j;
for (int i = j+1; i < list.size() - 1; i++) {
if (list.get(i).equals(list.get(j))) {
duplicates.add(list.get(i));
duplicates.add(list.get(j));
}
if(duplicates.size() > 0){
System.out.println(duplicates);
duplicates.clear();
}
}
}
}
you can see the working version in Ideone
Try this:
Extending the ArrayList
1)
boolean result = false;
if(!contains(object))
result= super.add(object);
return result;
OR
2)
ArrayList<String> myList = new ArrayList<String>()
{
#Override
public boolean add(String object)
{
boolean present = false;
boolean result = false;
for(int i=0;i<size();i++)
{
if(object.equals(get(i)))
{
present = true;
break;
}
}
if(!present)
result= super.add(object);
return result;
}
};
myList.add("1");
myList.add("2");
myList.add("3");
myList.add("1");
myList.add("2");
myList.add("3");
myList.add("1");
myList.add("1");
System.out.println(myList);
I am struggling with a "find supersequence" algorithm.
The input is for set of strings
String A = "caagccacctacatca";
String B = "cgagccatccgtaaagttg";
String C = "agaacctgctaaatgctaga";
the result would be properly aligned set of strings (and next step should be merge)
String E = "ca ag cca cc ta cat c a";
String F = "c gag ccat ccgtaaa g tt g";
String G = " aga acc tgc taaatgc t a ga";
Thank you for any advice (I am sitting on this task for more than a day)
after merge the superstring would be
cagagaccatgccgtaaatgcattacga
The definition of supersequence in "this case" would be something like
The string R is contained in supersequence S if and only if all characters in a string R are present in supersequence S in the order in which they occur in the input sequence R.
The "solution" i tried (and again its the wrong way of doing it) is:
public class Solution4
{
static boolean[][] map = null;
static int size = 0;
public static void main(String[] args)
{
String A = "caagccacctacatca";
String B = "cgagccatccgtaaagttg";
String C = "agaacctgctaaatgctaga";
Stack data = new Stack();
data.push(A);
data.push(B);
data.push(C);
Stack clone1 = data.clone();
Stack clone2 = data.clone();
int length = 26;
size = max_size(data);
System.out.println(size+" "+length);
map = new boolean[26][size];
char[] result = new char[size];
HashSet<String> chunks = new HashSet<String>();
while(!clone1.isEmpty())
{
String a = clone1.pop();
char[] residue = make_residue(a);
System.out.println("---");
System.out.println("OLD : "+a);
System.out.println("RESIDUE : "+String.valueOf(residue));
String[] r = String.valueOf(residue).split(" ");
for(int i=0; i<r.length; i++)
{
if(r[i].equals(" ")) continue;
//chunks.add(spaces.substring(0,i)+r[i]);
chunks.add(r[i]);
}
}
for(String chunk : chunks)
{
System.out.println("CHUNK : "+chunk);
}
}
static char[] make_residue(String candidate)
{
char[] result = new char[size];
for(int i=0; i<candidate.length(); i++)
{
int pos = find_position_for(candidate.charAt(i),i);
for(int j=i; j<pos; j++) result[j]=' ';
if(pos==-1) result[candidate.length()-1] = candidate.charAt(i);
else result[pos] = candidate.charAt(i);
}
return result;
}
static int find_position_for(char character, int offset)
{
character-=((int)'a');
for(int i=offset; i<size; i++)
{
// System.out.println("checking "+String.valueOf((char)(character+((int)'a')))+" at "+i);
if(!map[character][i])
{
map[character][i]=true;
return i;
}
}
return -1;
}
static String move_right(String a, int from)
{
return a.substring(0, from)+" "+a.substring(from);
}
static boolean taken(int character, int position)
{ return map[character][position]; }
static void take(char character, int position)
{
//System.out.println("taking "+String.valueOf(character)+" at "+position+" (char_index-"+(character-((int)'a'))+")");
map[character-((int)'a')][position]=true;
}
static int max_size(Stack stack)
{
int max=0;
while(!stack.isEmpty())
{
String s = stack.pop();
if(s.length()>max) max=s.length();
}
return max;
}
}
Finding any common supersequence is not a difficult task:
In your example possible solution would be something like:
public class SuperSequenceTest {
public static void main(String[] args) {
String A = "caagccacctacatca";
String B = "cgagccatccgtaaagttg";
String C = "agaacctgctaaatgctaga";
int iA = 0;
int iB = 0;
int iC = 0;
char[] a = A.toCharArray();
char[] b = B.toCharArray();
char[] c = C.toCharArray();
StringBuilder sb = new StringBuilder();
while (iA < a.length || iB < b.length || iC < c.length) {
if (iA < a.length && iB < b.length && iC < c.length && (a[iA] == b[iB]) && (a[iA] == c[iC])) {
sb.append(a[iA]);
iA++;
iB++;
iC++;
}
else if (iA < a.length && iB < b.length && a[iA] == b[iB]) {
sb.append(a[iA]);
iA++;
iB++;
}
else if (iA < a.length && iC < c.length && a[iA] == c[iC]) {
sb.append(a[iA]);
iA++;
iC++;
}
else if (iB < b.length && iC < c.length && b[iB] == c[iC]) {
sb.append(b[iB]);
iB++;
iC++;
} else {
if (iC < c.length) {
sb.append(c[iC]);
iC++;
}
else if (iB < b.length) {
sb.append(b[iB]);
iB++;
} else if (iA < a.length) {
sb.append(a[iA]);
iA++;
}
}
}
System.out.println("SUPERSEQUENCE " + sb.toString());
}
}
However the real problem to solve is to find the solution for the known problem of Shortest Common Supersequence http://en.wikipedia.org/wiki/Shortest_common_supersequence,
which is not that easy.
There is a lot of researches which concern the topic.
See for instance:
http://www.csd.uwo.ca/~lila/pdfs/Towards%20a%20DNA%20solution%20to%20the%20Shortest%20Common%20Superstring%20Problem.pdf
http://www.ncbi.nlm.nih.gov/pubmed/14534185
You can try finding the shortest combination like this
static final char[] CHARS = "acgt".toCharArray();
public static void main(String[] ignored) {
String A = "caagccacctacatca";
String B = "cgagccatccgtaaagttg";
String C = "agaacctgctaaatgctaga";
String expected = "cagagaccatgccgtaaatgcattacga";
List<String> ABC = new Combination(A, B, C).findShortest();
System.out.println("expected: " + expected.length());
System.out.println("Merged: " + ABC.get(0).length() + " " + ABC);
}
static class Combination {
int shortest = Integer.MAX_VALUE;
List<String> shortestStr = new ArrayList<>();
char[][] chars;
int[] pos;
int count = 0;
Combination(String... strs) {
chars = new char[strs.length][];
pos = new int[strs.length];
for (int i = 0; i < strs.length; i++) {
chars[i] = strs[i].toCharArray();
}
}
public List<String> findShortest() {
findShortest0(new StringBuilder(), pos);
return shortestStr;
}
private void findShortest0(StringBuilder sb, int[] pos) {
if (allDone(pos)) {
if (sb.length() < shortest) {
shortestStr.clear();
shortest = sb.length();
}
if (sb.length() <= shortest)
shortestStr.add(sb.toString());
count++;
if (++count % 100 == 1)
System.out.println("Searched " + count + " shortest " + shortest);
return;
}
if (sb.length() + maxLeft(pos) > shortest)
return;
int[] pos2 = new int[pos.length];
int i = sb.length();
sb.append(' ');
for (char c : CHARS) {
if (!tryChar(pos, pos2, c)) continue;
sb.setCharAt(i, c);
findShortest0(sb, pos2);
}
sb.setLength(i);
}
private int maxLeft(int[] pos) {
int maxLeft = 0;
for (int i = 0; i < pos.length; i++) {
int left = chars[i].length - pos[i];
if (left > maxLeft)
maxLeft = left;
}
return maxLeft;
}
private boolean allDone(int[] pos) {
for (int i = 0; i < chars.length; i++)
if (pos[i] < chars[i].length)
return false;
return true;
}
private boolean tryChar(int[] pos, int[] pos2, char c) {
boolean matched = false;
for (int i = 0; i < chars.length; i++) {
pos2[i] = pos[i];
if (pos[i] >= chars[i].length) continue;
if (chars[i][pos[i]] == c) {
pos2[i]++;
matched = true;
}
}
return matched;
}
}
prints many solutions which are shorter than the one suggested.
expected: 28
Merged: 27 [acgaagccatccgctaaatgctatcga, acgaagccatccgctaaatgctatgca, acgaagccatccgctaacagtgctaga, acgaagccatccgctaacatgctatga, acgaagccatccgctaacatgcttaga, acgaagccatccgctaacatgtctaga, acgaagccatccgctacaagtgctaga, acgaagccatccgctacaatgctatga, acgaagccatccgctacaatgcttaga, acgaagccatccgctacaatgtctaga, acgaagccatcgcgtaaatgctatcga, acgaagccatcgcgtaaatgctatgca, acgaagccatcgcgtaacagtgctaga, acgaagccatcgcgtaacatgctatga, acgaagccatcgcgtaacatgcttaga, acgaagccatcgcgtaacatgtctaga, acgaagccatcgcgtacaagtgctaga, acgaagccatcgcgtacaatgctatga, acgaagccatcgcgtacaatgcttaga, acgaagccatcgcgtacaatgtctaga, acgaagccatgccgtaaatgctatcga, acgaagccatgccgtaaatgctatgca, acgaagccatgccgtaacagtgctaga, acgaagccatgccgtaacatgctatga, acgaagccatgccgtaacatgcttaga, acgaagccatgccgtaacatgtctaga, acgaagccatgccgtacaagtgctaga, acgaagccatgccgtacaatgctatga, acgaagccatgccgtacaatgcttaga, acgaagccatgccgtacaatgtctaga, cagaagccatccgctaaatgctatcga, cagaagccatccgctaaatgctatgca, cagaagccatccgctaacagtgctaga, cagaagccatccgctaacatgctatga, cagaagccatccgctaacatgcttaga, cagaagccatccgctaacatgtctaga, cagaagccatccgctacaagtgctaga, cagaagccatccgctacaatgctatga, cagaagccatccgctacaatgcttaga, cagaagccatccgctacaatgtctaga, cagaagccatcgcgtaaatgctatcga, cagaagccatcgcgtaaatgctatgca, cagaagccatcgcgtaacagtgctaga, cagaagccatcgcgtaacatgctatga, cagaagccatcgcgtaacatgcttaga, cagaagccatcgcgtaacatgtctaga, cagaagccatcgcgtacaagtgctaga, cagaagccatcgcgtacaatgctatga, cagaagccatcgcgtacaatgcttaga, cagaagccatcgcgtacaatgtctaga, cagaagccatgccgtaaatgctatcga, cagaagccatgccgtaaatgctatgca, cagaagccatgccgtaacagtgctaga, cagaagccatgccgtaacatgctatga, cagaagccatgccgtaacatgcttaga, cagaagccatgccgtaacatgtctaga, cagaagccatgccgtacaagtgctaga, cagaagccatgccgtacaatgctatga, cagaagccatgccgtacaatgcttaga, cagaagccatgccgtacaatgtctaga, cagagaccatccgctaaatgctatcga, cagagaccatccgctaaatgctatgca, cagagaccatccgctaacagtgctaga, cagagaccatccgctaacatgctatga, cagagaccatccgctaacatgcttaga, cagagaccatccgctaacatgtctaga, cagagaccatccgctacaagtgctaga, cagagaccatccgctacaatgctatga, cagagaccatccgctacaatgcttaga, cagagaccatccgctacaatgtctaga, cagagaccatcgcgtaaatgctatcga, cagagaccatcgcgtaaatgctatgca, cagagaccatcgcgtaacagtgctaga, cagagaccatcgcgtaacatgctatga, cagagaccatcgcgtaacatgcttaga, cagagaccatcgcgtaacatgtctaga, cagagaccatcgcgtacaagtgctaga, cagagaccatcgcgtacaatgctatga, cagagaccatcgcgtacaatgcttaga, cagagaccatcgcgtacaatgtctaga, cagagaccatgccgtaaatgctatcga, cagagaccatgccgtaaatgctatgca, cagagaccatgccgtaacagtgctaga, cagagaccatgccgtaacatgctatga, cagagaccatgccgtaacatgcttaga, cagagaccatgccgtaacatgtctaga, cagagaccatgccgtacaagtgctaga, cagagaccatgccgtacaatgctatga, cagagaccatgccgtacaatgcttaga, cagagaccatgccgtacaatgtctaga, cagagccatcctagctaaagtgctaga, cagagccatcctagctaaatgctatga, cagagccatcctagctaaatgcttaga, cagagccatcctagctaaatgtctaga, cagagccatcctgactaaagtgctaga, cagagccatcctgactaaatgctatga, cagagccatcctgactaaatgcttaga, cagagccatcctgactaaatgtctaga, cagagccatcctgctaaatgctatcga, cagagccatcctgctaaatgctatgca, cagagccatcctgctaacagtgctaga, cagagccatcctgctaacatgctatga, cagagccatcctgctaacatgcttaga, cagagccatcctgctaacatgtctaga, cagagccatcctgctacaagtgctaga, cagagccatcctgctacaatgctatga, cagagccatcctgctacaatgcttaga, cagagccatcctgctacaatgtctaga]