Find Shortest Part of Sentence containing given words - java

Ex:
if there is a sentence given:
My name is not eugene. my pet name is not eugene.
And we have to search the smallest part in the sentence that Contains the given words
my and eugene
then the answer will be
eugene. my.
No need to check the uppercase or lowercase or special charaters or numerics.
I have pasted my code but getting wrong answer for some test cases.
can any one have any idea what is the problem with the code . I don't have the test case for which it is wrong.
import java.io.*;
import java.util.*;
public class ShortestSegment
{
static String[] pas;
static String[] words;
static int k,st,en,fst,fen,match,d;
static boolean found=false;
static int[] loc;
static boolean[] matches ;
public static void main(String s[]) throws IOException
{
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
pas = in.readLine().replaceAll("[^A-Za-z ]", "").split(" ");
k = Integer.parseInt(in.readLine());
words = new String[k];
matches = new boolean[k];
loc = new int[k];
for(int i=0;i<k;i++)
{
words[i] = in.readLine();
}
en = fen = pas.length;
find(0);
if(found==false)
System.out.println("NO SUBSEGMENT FOUND");
else
{
for(int j=fst;j<=fen;j++)
System.out.print(pas[j]+" ");
}
}
private static void find(int min)
{
if(min==pas.length)
return;
for(int i=0;i<k;i++)
{
if(pas[min].equalsIgnoreCase(words[i]))
{
if(matches[i]==false)
{
loc[i]=min;
matches[i] =true;
match++;
}
else
{
loc[i]=min;
}
if(match==k)
{
en=min;
st = min();
found=true;
if((fen-fst)>(en-st))
{
fen=en;
fst=st;
}
match--;
matches[getIdx()]=false;
}
}
}
find(min+1);
}
private static int getIdx()
{
for(int i=0;i<k;i++)
{
if(words[i].equalsIgnoreCase(pas[st]))
return i;
}
return -1;
}
private static int min()
{
int min=loc[0];
for(int i=1;i<loc.length;i++)
if(min>loc[i])
min=loc[i];
return min;
}
}

The code you've given will produce incorrect output for the following input. I'm assuming, the word length also matters when you want to 'Find Shortest Part of Sentence containing given words'
String: 'My firstname is eugene. My fn is eugene.'
Number of search strings: 2
string1: 'my'
string2: 'is'
Your solution is: 'My firstname is'
The correct answer is: 'My fn is'
The problem in your code is, it considers both 'firstname' and 'fn' as same length. In the comparison (fen-fst)>(en-st) you're only considering whether the number of words has minimized and not whether the word lengths has shortened.

the following codes (junit):
#Test
public void testIt() {
final String s = "My name is not eugene. my pet name is not eugene.";
final String tmp = s.toLowerCase().replaceAll("[^a-zA-Z]", " ");//here we need the placeholder (blank)
final String w1 = "my "; // leave a blank at the end to avoid those words e.g. "myself", "myth"..
final String w2 = "eugene ";//same as above
final List<Integer> l1 = getList(tmp, w1); //indexes list
final List<Integer> l2 = getList(tmp, w2);
int min = Integer.MAX_VALUE;
final int[] idx = new int[] { 0, 0 };
//loop to find out the result
for (final int i : l1) {
for (final int j : l2) {
if (Math.abs(j - i) < min) {
final int x = j - i;
min = Math.abs(j - i);
idx[0] = j - i > 0 ? i : j;
idx[1] = j - i > 0 ? j + w2.length() + 2 : i + w1.length() + 2;
}
}
}
System.out.println("indexes: " + Arrays.toString(idx));
System.out.println("result: " + s.substring(idx[0], idx[1]));
}
private List<Integer> getList(final String input, final String search) {
String t = new String(input);
final List<Integer> list = new ArrayList<Integer>();
int tmp = 0;
while (t.length() > 0) {
final int x = t.indexOf(search);
if (x < 0 || x > t.length()) {
break;
}
tmp += x;
list.add(tmp);
t = t.substring(search.length() + x);
}
return list;
}
give output:
indexes: [15, 25]
result: eugene. my
I think the codes with inline comments are pretty easy to understand. basically, playing with index+wordlength.
Note
the "Not Found" case is not implemented.
codes are just showing the
idea, it can be optimized. e.g. at least one abs() could be saved.
etc...
hope it helps.

I think it can be handled in another way :
First , find a matching result , and minimize the bound to the current result and then find a matching result from the current result .It can be coded as follows:
/**This method intends to check the shortest interval between two words
* #param s : the string to be processed at
* #param first : one of the words
* #param second : one of the words
*/
public static void getShortestInterval(String s , String first , String second)
{
String situationOne = first + "(.*?)" + second;
String situationTwo = second + "(.*?)" + first;
Pattern patternOne = Pattern.compile(situationOne,Pattern.DOTALL|Pattern.CASE_INSENSITIVE);
Pattern patternTwo = Pattern.compile(situationTwo,Pattern.DOTALL|Pattern.CASE_INSENSITIVE);
List<Integer> result = new ArrayList<Integer>(Arrays.asList(Integer.MAX_VALUE,-1,-1));
/**first , test the first choice*/
Matcher matcherOne = patternOne.matcher(s);
findTheMax(first.length(),matcherOne, result);
/**then , test the second choice*/
Matcher matcherTwo = patternTwo.matcher(s);
findTheMax(second.length(),matcherTwo,result);
if(result.get(0)!=Integer.MAX_VALUE)
{
System.out.println("The shortest length is " + result.get(0));
System.out.println("Which start # " + result.get(1));
System.out.println("And end # " + result.get(2));
}else
System.out.println("No matching result is found!");
}
private static void findTheMax(int headLength , Matcher matcher , List<Integer> result)
{
int length = result.get(0);
int startIndex = result.get(1);
int endIndex = result.get(2);
while(matcher.find())
{
int temp = matcher.group(1).length();
int start = matcher.start();
List<Integer> minimize = new ArrayList<Integer>(Arrays.asList(Integer.MAX_VALUE,-1,-1));
System.out.println(matcher.group().substring(headLength));
findTheMax(headLength, matcher.pattern().matcher(matcher.group().substring(headLength)), minimize);
if(minimize.get(0) != Integer.MAX_VALUE)
{
start = start + minimize.get(1) + headLength;
temp = minimize.get(0);
}
if(temp<length)
{
length = temp;
startIndex = start;
endIndex = matcher.end();
}
}
result.set(0, length);
result.set(1, startIndex);
result.set(2, endIndex);
}
Note that this can handle two situations , regardless of the sequence of the two words!

you can use Knuth Morris Pratt algorithm to find indexes of all occurrences of every given word in your text. Imagine you have text of length N and M words (w1 ... wM). Using KMP algorithm you can get array:
occur = string[N];
occur[i] = 1, if w1 starts at position i
...
occur[i] = M, if wM starts at position i
occur[i] = 0, if no word from w1...wM starts at position i
you loop through this array and from every non-zero position search forward for other M-1 words.
This is approximate pseudocode. Just to understand the idea. It definitely won't work if you just recode it on java:
for i=0 to N-1 {
if occur[i] != 0 {
for j = i + w[occur[i] - 1].length - 1 { // searching forward
if occur[j] != 0 and !foundWords.contains(occur[j]) {
foundWords.add(occur[j]);
lastWordInd = j;
if foundWords.containAllWords() break;
}
foundTextPeaceLen = j + w[occur[lastWordInd]].length - i;
if foundTextPeaceLen < minTextPeaceLen {
minTextPeaceLen = foundTextPeaceLen;
// also remember start and end indexes of text peace
}
}
}
}

Related

How to sort a array that contains special characters alphabetically?

I am looking for code that produces the following output in standard output from the following string prepared according to a certain format.
Assumptions and rules:
Each letter is used 2 times in the given string and the letters between the same 2 letters are to be considered child letters.
The given string is always given in proper format. The string format
does not need to be checked.
Example:
Input : abccbdeeda
Expected output:
a
--b
----c
--d
----e
Explanation: since the 2 letters "b" occur between the letters "a", the letter b takes 2 hyphens (--b)
Attempt
public static void main(String[] args) {
String input = "abccbdeeda";
System.out.println("input: " + input);
String[] strSplit = input.split("");
String g = "";
String h = "-";
ArrayList<String> list = new ArrayList<String>();
int counter = 1;
boolean secondNumber;
list.add(strSplit[0]);
int dual = 0;
for (int i = 1; i < strSplit.length; i++) {
secondNumber = list.contains(strSplit[i]);
if ((secondNumber)) {
counter--;
dual = counter * 2;
for (int f = 0; f < dual; f++) {
strSplit[i] = h.concat(strSplit[i]);
}
g = "";
dual = 0;
} else {
list.add(strSplit[i]);
counter++;
}
}
Arrays.sort(strSplit);
for (int p = 0; p < strSplit.length; p++) {
System.out.println(strSplit[p]);
}
}
input: abccbdeeda
My output:
----c
----e
--b
--d
a
I wasn't able to sort the output alphabetically. How can I sort alphabetically with those hyphen characters in them?
This task is nicely done with the help of a stack. If the current character is equal to the top of the stack, then the character is closed and can be removed, otherwise we met it for the first time and it must be added to the stack and the resulting string by adding before it stack.size() * 2 dashes.
When we have completely traversed the string we can sort the resulting string.
public static void main(String[] args) {
Stack<Character> stack = new Stack<>();
String string = "abccbdeeda";
StringBuilder result = new StringBuilder();
for(int i = 0; i < string.length(); i++) {
char curChar = string.charAt(i);
if(!stack.isEmpty() && curChar == stack.peek()) {
stack.pop();
} else {
result.append("-".repeat(stack.size() * 2)).append(curChar).append(" ");
stack.add(curChar);
}
}
System.out.println(result);
System.out.println(Arrays.toString(Arrays.stream(result.toString().split(" ")).sorted().toArray()));
}
Output
a --b ----c --d ----e
[----c, ----e, --b, --d, a]
You can go through the strSplit array and extract the charactors in each element to a separate list/array. To check whether the array element contains a letter you can write a regular expression.
Ex: private final Pattern x = Pattern.compile("[a-z]");
Write a separate method to match the patern to each element in the strSplit array. This method will return the charactor in your input string.
private String findCharactor(final StringBuilder element) {
final Matcher matcher = x.matcher(element);
if (matcher.find()) {
final int matchIndex = matcher.start(); //this gives the index of the char in the string
return element.substring(matchIndex);
}
}
Add these returned charactors to a separate array and sort it using sorting function.
Suppose your result list is:
List<String> resultList = Arrays.asList("----c", "----e", "--b", "--d", "a");
You can sort it alphabetically by a single line:
Collections.sort(resultList, (o1, o2) -> new StringBuilder(o1).reverse().toString().compareTo(new StringBuilder(o2).reverse().toString()));
You can use recursion for a depth-first traversal (preorder):
public static String dfs(String string, String prefix) {
if (string.length() == 0) return "";
int i = string.indexOf(string.charAt(0), 1);
return prefix + string.charAt(0) + "\n" // current
+ dfs(string.substring(1, i), prefix + "--") // all nested
+ dfs(string.substring(i + 1), prefix); // all siblings
}
Example call:
public static void main(String[] args) {
System.out.println(dfs("abccbdeeda", ""));
}

How to find distance between two anagrams in a string

Requirement: Given string, find distance between all occurrences of anagrams of strings
Example: "programmerxxddporragmmerbbffprogrammer"
String pat = "programmer";
Expected output: 4
distance between two palidromes of "programmer" is 4
//Java program to search all anagrams
//of a pattern in a text
public class Pattern
{
static final int MAX = 256;
// This function returns true if contents
// of arr1[] and arr2[] are same, otherwise
// false.
static boolean compare(char arr1[], char arr2[])
{
for (int i = 0; i < MAX; i++)
if (arr1[i] != arr2[i])
return false;
return true;
}
// This function search for all permutations
// of pat[] in txt[]
static void search(String pat, String txt)
{
int M = pat.length();
int N = txt.length();
// countP[]: Store count of all
// characters of pattern
// countTW[]: Store count of current
// window of text
char[] countP = new char[MAX];
char[] countTW = new char[MAX];
for (int i = 0; i < M; i++)
{
(countP[pat.charAt(i)])++;
(countTW[txt.charAt(i)])++;
}
// Traverse through remaining characters
// of pattern
for (int i = M; i < N; i++)
{
// Compare counts of current window
// of text with counts of pattern[]
if (compare(countP, countTW))
System.out.println("Found at Index " +
(i - M));
// Add current character to current
// window
(countTW[txt.charAt(i)])++;
// Remove the first character of previous
// window
countTW[txt.charAt(i-M)]--;
}
// Check for the last window in text
if (compare(countP, countTW)) {
System.out.println("Found at Index " +
(N - M));
System.out.println(N-M-M);
}
}
/* Driver program to test above function */
public static void main(String args[])
{
String txt = "programmerxxddporragmmerbbffprogrammer";
String pat = "programmer";
search(pat, txt);
}
}
I am required to print first instance of the difference between two anagrams. In my case, 4. My code is printing for the final string like this:
So your code was doing well at finding the palindromes. So I adapted it to track where it found the last palindrome so that when it finds a new one, it can take the distance between, which would be the indices of each minus the length of the pattern (I chose to track all differences, but you could suppress that). Here is the code adapted to report the distance each time (the rest is unchanged).
//Java program to search all anagrams
//of a pattern in a text
public class Pattern{
static final int MAX = 256;
// This function returns true if contents
// of arr1[] and arr2[] are same, otherwise
// false.
static boolean compare(char arr1[], char arr2[])
{
for (int i = 0; i < MAX; i++)
if (arr1[i] != arr2[i])
return false;
return true;
}
// This function search for all permutations
// of pat[] in txt[]
static void search(String pat, String txt)
{
int M = pat.length();
int N = txt.length();
int lastFoundIndex = -1;
// countP[]: Store count of all
// characters of pattern
// countTW[]: Store count of current
// window of text
char[] countP = new char[MAX];
char[] countTW = new char[MAX];
for (int i = 0; i < M; i++)
{
(countP[pat.charAt(i)])++;
(countTW[txt.charAt(i)])++;
}
// Traverse through remaining characters
// of pattern
for (int i = M; i < N; i++)
{
// Compare counts of current window
// of text with counts of pattern[]
if (compare(countP, countTW)) {
System.out.println("Found at Index " +
(i - M));
if (lastFoundIndex==-1){
lastFoundIndex = i-M;}
else {
System.out.println("Distance between is: "+(i-M-lastFoundIndex-pat.length()));
lastFoundIndex = i-M;
}
}
// Add current character to current
// window
(countTW[txt.charAt(i)])++;
// Remove the first character of previous
// window
countTW[txt.charAt(i-M)]--;
}
// Check for the last window in text
if (compare(countP, countTW)) {
System.out.println("Found at Index " +
(N - M));
if (lastFoundIndex==-1){
lastFoundIndex = N-M;
}
else{
System.out.println("Distance between is: "+(N-M-lastFoundIndex-pat.length()));
lastFoundIndex = N-M;
}
}
}
/* Driver program to test above function */
public static void main(String args[])
{
String txt = "programmerxxddporragmmerbbffprogrammer";
String pat = "programmer";
search(pat, txt);
}
}
output is
Found at Index 0
Found at Index 14
Distance between is: 4
Found at Index 28
Distance between is: 4
REVISION:
I then went back and implemented the whole thing differently, possibly less efficiently, but easier for me to follow (output was the same).
//Java program to search all anagrams
//of a pattern in a text
import java.util.Arrays;
public class Pattern {
// Method to sort a string alphabetically (from https://www.geeksforgeeks.org/sort-a-string-in-java-2-different-ways/)
public static String sortString(String inputString) {
// convert input string to char array
char tempArray[] = inputString.toCharArray();
// sort tempArray
Arrays.sort(tempArray);
// return new sorted string
return new String(tempArray);
}
// This function searches for all permutations
// of pat in txt
static void search(String pat, String txt) {
String patSorted = sortString(pat); //sort the pattern once
int M = pat.length();
int N = txt.length();
int lastFoundIndex = -1; //last place found
for (int i = 0; N - i >= M; i++) { //while there are still enough remaining characters from this index on in txt
if (sortString(txt.substring(i, i + M)).equals(patSorted)) {
System.out.println("Found at Index " + i);
if (lastFoundIndex == -1) {
lastFoundIndex = i;
} else {
System.out.println("Distance between is: " + (i - lastFoundIndex - M));
lastFoundIndex = i;
}
}
}
}
/* Driver program to test above function */
public static void main(String args[]) {
String txt = "programmerxxddporragmmerbbffprogrammer";
String pat = "programmer";
search(pat, txt);
}
}

Find the longest word in a String

Following is my code:
String LongestWord(String a)
{
int lw=0;
int use;
String lon="";
while (!(a.isEmpty()))
{
a=a.trim();
use=a.indexOf(" ");
if (use<0)
{
break;
}
String cut=a.substring(0,use);
if(cut.length()>lw)
{
lon=cut;
}
lw=lon.length();
a=a.replace(cut," ");
}
return lon;
}
The problem is that when I input a string like,
"a boy is playing in the park"
it returns the longest word as "ying" because when it replaces 'cut' with " " for the first time, it removes all the 'a'-s too, such that it becomes
" boy is pl ying in the p rk" after the first iteration of the loop
Please figure out what's wrong?
Thanks in advance!
You have already known the problem: the program does unwanted replacement.
Therefore, stop doing replacement.
In this program, the word examined is directly cut instead of using the harmful replacement.
String LongestWord(String a)
{
int lw=0;
int use;
String lon="";
while (!(a.isEmpty()))
{
a=a.trim();
use=a.indexOf(" ");
if (use<0)
{
break;
}
String cut=a.substring(0,use);
if(cut.length()>lw)
{
lon=cut;
}
lw=lon.length();
a=a.substring(use+1); // cut the word instead of doing harmful replacement
}
return lon;
}
You can use the split function to get an array of strings.
Than cycle that array to find the longest string and return it.
String LongestWord(String a) {
String[] parts = a.split(" ");
String longest = null;
for (String part : parts) {
if (longest == null || longest.length() < part.length()) {
longest = part;
}
}
return longest;
}
I would use arrays:
String[] parts = a.split(" ");
Then you can loop over parts, for each element (is a string) you can check length:
parts[i].length()
and find longest one.
I would use a Scanner to do this
String s = "the boy is playing in the parl";
int length = 0;
String word = "";
Scanner scan = new Scanner(s);
while(scan.hasNext()){
String temp = scan.next();
int tempLength = temp.length();
if(tempLength > length){
length = tempLength;
word = temp;
}
}
}
You check the length of each word, if it's longer then all the previous you store that word into the String "word"
Another way uses Streams.
Optional<String> max = Arrays.stream("a boy is playing in the park"
.split(" "))
.max((a, b) -> a.length() - b.length());
System.out.println("max = " + max);
if you are looking for not trivial Solution ,you can solve it without using split or map but with only one loop
static String longestWorld(String pharagragh) {
int maxLength = 0;
String word=null,longestWorld = null;
int startIndexOfWord = 0, endIndexOfWord;
int wordLength = 0;
for (int i = 0; i < pharagragh.length(); i++) {
if (pharagragh.charAt(i) == ' ') {
endIndexOfWord = i;
wordLength = endIndexOfWord - startIndexOfWord;
word = pharagragh.substring(startIndexOfWord, endIndexOfWord);
startIndexOfWord = endIndexOfWord + 1;
if (wordLength > maxLength) {
maxLength = wordLength;
longestWorld = word;
}
}
}
return longestWorld;
}
now lets test it
System.out.println(longestWorld("Hello Stack Overflow Welcome to Challenge World"));// output is Challenge
Try :
package testlongestword;
/**
*
* #author XOR
*/
public class TestLongestWord{
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
System.out.println(LongestWord("a boy is playing in the park"));
}
public static String LongestWord(String str){
String[] words = str.split(" ");
int index = 0;
for(int i = 0; i < words.length; ++i){
final String current = words[i];
if(current.length() > words[index].length()){
index = i;
}
}
return words[index];
}
}

How to split string at every nth occurrence of character in Java

I would like to split a string at every 4th occurrence of a comma ,.
How to do this? Below is an example:
String str = "1,,,,,2,3,,1,,3,,";
Expected output:
array[0]: 1,,,,
array[1]: ,2,3,,
array[2]: 1,,3,,
I tried using Google Guava like this:
Iterable<String> splitdata = Splitter.fixedLength(4).split(str);
output: [1,,,, ,,2,, 3,,1, ,,3,, ,]
I also tried this:
String [] splitdata = str.split("(?<=\\G.{" + 4 + "})");
output: [1,,,, ,,2,, 3,,1, ,,3,, ,]
Yet this is is not the output I want. I just want to split the string at every 4th occurrence of a comma.
Thanks.
Take two int variable. One is to count the no of ','. If ',' occurs then the count will move. And if the count is go to 4 then reset it to 0. The other int value will indicate that from where the string will be cut off. it will start from 0 and after the first string will be detected the the end point (char position in string) will be the first point of the next. Use the this start point and current end point (i+1 because after the occurrence happen the i value will be incremented). Finally add the string in the array list. This is a sample code. Hope this will help you. Sorry for my bad English.
String str = "1,,,,,2,3,,1,,3,,";
int k = 0;
int startPoint = 0;
ArrayList<String> arrayList = new ArrayList<>();
for (int i = 0; i < str.length(); i++)
{
if (str.charAt(i) == ',')
{
k++;
if (k == 4)
{
String ab = str.substring(startPoint, i+1);
System.out.println(ab);
arrayList.add(ab);
startPoint = i+1;
k = 0;
}
}
}
Here's a more flexible function, using an idea from this answer:
static List<String> splitAtNthOccurrence(String input, int n, String delimiter) {
List<String> pieces = new ArrayList<>();
// *? is the reluctant quantifier
String regex = Strings.repeat(".*?" + delimiter, n);
Matcher matcher = Pattern.compile(regex).matcher(input);
int lastEndOfMatch = -1;
while (matcher.find()) {
pieces.add(matcher.group());
lastEndOfMatch = matcher.end();
}
if (lastEndOfMatch != -1) {
pieces.add(input.substring(lastEndOfMatch));
}
return pieces;
}
This is how you call it using your example:
String input = "1,,,,,2,3,,1,,3,,";
List<String> pieces = splitAtNthOccurrence(input, 4, ",");
pieces.forEach(System.out::println);
// Output:
// 1,,,,
// ,2,3,,
// 1,,3,,
I use Strings.repeat from Guava.
try this also, if you want result in array
String str = "1,,,,,2,3,,1,,3,,";
System.out.println(str);
char c[] = str.toCharArray();
int ptnCnt = 0;
for (char d : c) {
if(d==',')
ptnCnt++;
}
String result[] = new String[ptnCnt/4];
int i=-1;
int beginIndex = 0;
int cnt=0,loopcount=0;
for (char ele : c) {
loopcount++;
if(ele==',')
cnt++;
if(cnt==4){
cnt=0;
result[++i]=str.substring(beginIndex,loopcount);
beginIndex=loopcount;
}
}
for (String string : result) {
System.out.println(string);
}
This work pefectly and tested in Java 8
public String[] split(String input,int at){
String[] out = new String[2];
String p = String.format("((?:[^/]*/){%s}[^/]*)/(.*)",at);
Pattern pat = Pattern.compile(p);
Matcher matcher = pat.matcher(input);
if (matcher.matches()) {
out[0] = matcher.group(1);// left
out[1] = matcher.group(2);// right
}
return out;
}
//Ex: D:/folder1/folder2/folder3/file1.txt
//if at = 2, group(1) = D:/folder1/folder2 and group(2) = folder3/file1.txt
The accepted solution above by Saqib Rezwan does not add the leftover string to the list, if it divides the string after every 4th comma and the length of the string is 9 then it will leave the 9th character, and return the wrong list.
A complete solution would be :
private static ArrayList<String> splitStringAtNthOccurrence(String str, int n) {
int k = 0;
int startPoint = 0;
ArrayList<String> list = new ArrayList();
for (int i = 0; i < str.length(); i++) {
if (str.charAt(i) == ',') {
k++;
if (k == n) {
String ab = str.substring(startPoint, i + 1);
list.add(ab);
startPoint = i + 1;
k = 0;
}
}
// if there is no comma left and there are still some character in the string
// add them to list
else if (!str.substring(i).contains(",")) {
list.add(str.substring(startPoint));
break;
}
}
return list;
}
}

detect incomplete patterns in strings

i have a string containing nested repeating patterns, for example:
String pattern1 = "1234";
String pattern2 = "5678";
String patternscombined = "1234|1234|5678|9"//added | for reading pleasure
String pattern = (pattern1 + pattern1 + pattern2 + "9")
+(pattern1 + pattern1 + pattern2 + "9")
+(pattern1 + pattern1 + pattern2 + "9")
String result = "1234|1234|5678|9|1234|1234|56";
As you can see in the above example, the result got cut off. But when knowing the repeating patterns, you can predict, what could come next.
Now to my question:
How can i predict the next repetitions of this pattern, to get a resulting string like:
String predictedresult = "1234|1234|5678|9|1234|1234|5678|9|1234|1234|5678|9";
Patterns will be smaller that 10 characters, the predicted result will be smaller than 1000 characters.
I am only receiving the cutoff result string and a pattern recognition program is already implemented and working. In the above example, i would have result, pattern1, pattern2 and patternscombined.
EDIT:
I have found a solution working for me:
import java.util.Arrays;
public class LRS {
// return the longest common prefix of s and t
public static String lcp(String s, String t) {
int n = Math.min(s.length(), t.length());
for (int i = 0; i < n; i++) {
if (s.charAt(i) != t.charAt(i))
return s.substring(0, i);
}
return s.substring(0, n);
}
// return the longest repeated string in s
public static String lrs(String s) {
// form the N suffixes
int N = s.length();
String[] suffixes = new String[N];
for (int i = 0; i < N; i++) {
suffixes[i] = s.substring(i, N);
}
// sort them
Arrays.sort(suffixes);
// find longest repeated substring by comparing adjacent sorted suffixes
String lrs = "";
for (int i = 0; i < N - 1; i++) {
String x = lcp(suffixes[i], suffixes[i + 1]);
if (x.length() > lrs.length())
lrs = x;
}
return lrs;
}
public static int startingRepeats(final String haystack, final String needle)
{
String s = haystack;
final int len = needle.length();
if(len == 0){
return 0;
}
int count = 0;
while (s.startsWith(needle)) {
count++;
s = s.substring(len);
}
return count;
}
public static String lrscutoff(String s){
String lrs = s;
int length = s.length();
for (int i = length; i > 0; i--) {
String x = lrs(s.substring(0, i));
if (startingRepeats(s, x) < 10 &&
startingRepeats(s, x) > startingRepeats(s, lrs)){
lrs = x;
}
}
return lrs;
}
// read in text, replacing all consecutive whitespace with a single space
// then compute longest repeated substring
public static void main(String[] args) {
long time = System.nanoTime();
long timemilis = System.currentTimeMillis();
String s = "12341234567891234123456789123412345";
String repeat = s;
while(repeat.length() > 0){
System.out.println("-------------------------");
String repeat2 = lrscutoff(repeat);
System.out.println("'" + repeat + "'");
int count = startingRepeats(repeat, repeat2);
String rest = repeat.substring(count*repeat2.length());
System.out.println("predicted: (rest ='" + rest + "')" );
while(count > 0){
System.out.print("'" + repeat2 + "' + ");
count--;
}
if(repeat.equals(repeat2)){
System.out.println("''");
break;
}
if(rest!="" && repeat2.contains(rest)){
System.out.println("'" + repeat2 + "'");
}else{
System.out.println("'" + rest + "'");
}
repeat = repeat2;
}
System.out.println("Time: (nano+millis):");
System.out.println(System.nanoTime()-time);
System.out.println(System.currentTimeMillis()-timemilis);
}
}
If your predict String is always piped(|) the numbers then you can easily split them using pipe and then keep track of the counts on a HashMap. For example
1234 = 2
1344 = 1
4411 = 5
But if not, then you have to modify the Longest Repeated Substring algorithm. As you need to have all repeated substrings so keep track of all instead of only the Longest one. Also, you have to put a checking for minimum length of substring along with overlapping substring. By searching google you'll find lot of reference of this algorithm.
You seem to need something like an n-gram language model, which is a statistical model that is based on counts of co-occurring events. If you are given some training data, you can derive the probabilities from counts of seen patterns. If not, you can try to specify them manually, but this can get tricky. Once you have such a language model (where the digit patterns correspond to words), you can always predict the next word by picking one with the highest probability given some previous words ("history").

Categories

Resources