How to find distance between two anagrams in a string - java

Requirement: Given string, find distance between all occurrences of anagrams of strings
Example: "programmerxxddporragmmerbbffprogrammer"
String pat = "programmer";
Expected output: 4
distance between two palidromes of "programmer" is 4
//Java program to search all anagrams
//of a pattern in a text
public class Pattern
{
static final int MAX = 256;
// This function returns true if contents
// of arr1[] and arr2[] are same, otherwise
// false.
static boolean compare(char arr1[], char arr2[])
{
for (int i = 0; i < MAX; i++)
if (arr1[i] != arr2[i])
return false;
return true;
}
// This function search for all permutations
// of pat[] in txt[]
static void search(String pat, String txt)
{
int M = pat.length();
int N = txt.length();
// countP[]: Store count of all
// characters of pattern
// countTW[]: Store count of current
// window of text
char[] countP = new char[MAX];
char[] countTW = new char[MAX];
for (int i = 0; i < M; i++)
{
(countP[pat.charAt(i)])++;
(countTW[txt.charAt(i)])++;
}
// Traverse through remaining characters
// of pattern
for (int i = M; i < N; i++)
{
// Compare counts of current window
// of text with counts of pattern[]
if (compare(countP, countTW))
System.out.println("Found at Index " +
(i - M));
// Add current character to current
// window
(countTW[txt.charAt(i)])++;
// Remove the first character of previous
// window
countTW[txt.charAt(i-M)]--;
}
// Check for the last window in text
if (compare(countP, countTW)) {
System.out.println("Found at Index " +
(N - M));
System.out.println(N-M-M);
}
}
/* Driver program to test above function */
public static void main(String args[])
{
String txt = "programmerxxddporragmmerbbffprogrammer";
String pat = "programmer";
search(pat, txt);
}
}
I am required to print first instance of the difference between two anagrams. In my case, 4. My code is printing for the final string like this:

So your code was doing well at finding the palindromes. So I adapted it to track where it found the last palindrome so that when it finds a new one, it can take the distance between, which would be the indices of each minus the length of the pattern (I chose to track all differences, but you could suppress that). Here is the code adapted to report the distance each time (the rest is unchanged).
//Java program to search all anagrams
//of a pattern in a text
public class Pattern{
static final int MAX = 256;
// This function returns true if contents
// of arr1[] and arr2[] are same, otherwise
// false.
static boolean compare(char arr1[], char arr2[])
{
for (int i = 0; i < MAX; i++)
if (arr1[i] != arr2[i])
return false;
return true;
}
// This function search for all permutations
// of pat[] in txt[]
static void search(String pat, String txt)
{
int M = pat.length();
int N = txt.length();
int lastFoundIndex = -1;
// countP[]: Store count of all
// characters of pattern
// countTW[]: Store count of current
// window of text
char[] countP = new char[MAX];
char[] countTW = new char[MAX];
for (int i = 0; i < M; i++)
{
(countP[pat.charAt(i)])++;
(countTW[txt.charAt(i)])++;
}
// Traverse through remaining characters
// of pattern
for (int i = M; i < N; i++)
{
// Compare counts of current window
// of text with counts of pattern[]
if (compare(countP, countTW)) {
System.out.println("Found at Index " +
(i - M));
if (lastFoundIndex==-1){
lastFoundIndex = i-M;}
else {
System.out.println("Distance between is: "+(i-M-lastFoundIndex-pat.length()));
lastFoundIndex = i-M;
}
}
// Add current character to current
// window
(countTW[txt.charAt(i)])++;
// Remove the first character of previous
// window
countTW[txt.charAt(i-M)]--;
}
// Check for the last window in text
if (compare(countP, countTW)) {
System.out.println("Found at Index " +
(N - M));
if (lastFoundIndex==-1){
lastFoundIndex = N-M;
}
else{
System.out.println("Distance between is: "+(N-M-lastFoundIndex-pat.length()));
lastFoundIndex = N-M;
}
}
}
/* Driver program to test above function */
public static void main(String args[])
{
String txt = "programmerxxddporragmmerbbffprogrammer";
String pat = "programmer";
search(pat, txt);
}
}
output is
Found at Index 0
Found at Index 14
Distance between is: 4
Found at Index 28
Distance between is: 4
REVISION:
I then went back and implemented the whole thing differently, possibly less efficiently, but easier for me to follow (output was the same).
//Java program to search all anagrams
//of a pattern in a text
import java.util.Arrays;
public class Pattern {
// Method to sort a string alphabetically (from https://www.geeksforgeeks.org/sort-a-string-in-java-2-different-ways/)
public static String sortString(String inputString) {
// convert input string to char array
char tempArray[] = inputString.toCharArray();
// sort tempArray
Arrays.sort(tempArray);
// return new sorted string
return new String(tempArray);
}
// This function searches for all permutations
// of pat in txt
static void search(String pat, String txt) {
String patSorted = sortString(pat); //sort the pattern once
int M = pat.length();
int N = txt.length();
int lastFoundIndex = -1; //last place found
for (int i = 0; N - i >= M; i++) { //while there are still enough remaining characters from this index on in txt
if (sortString(txt.substring(i, i + M)).equals(patSorted)) {
System.out.println("Found at Index " + i);
if (lastFoundIndex == -1) {
lastFoundIndex = i;
} else {
System.out.println("Distance between is: " + (i - lastFoundIndex - M));
lastFoundIndex = i;
}
}
}
}
/* Driver program to test above function */
public static void main(String args[]) {
String txt = "programmerxxddporragmmerbbffprogrammer";
String pat = "programmer";
search(pat, txt);
}
}

Related

Palindrome in java

Here what I tried
sample input is "aabaa"
eg: in if condition val[0] = a[4]
if it is equal i stored it in counter variable if it is half of the length it original string it is palindrome
if it is not it is not a palindrome
I tried with my basic knowledge in java if there is any errors let me know
boolean solution(String inputString) {
int val = inputString.length();
int count = 0;
for (int i = 0; i<inputString.length(); i++) {
if(inputString.charAt(i) == inputString.charAt(val-i)) {
count = count++;
if (count>0) {
return true;
}
}
}
return true;
}
How about
public boolean isPalindrome(String text) {
String clean = text.replaceAll("\\s+", "").toLowerCase();
int length = clean.length();
int forward = 0;
int backward = length - 1;
while (backward > forward) {
char forwardChar = clean.charAt(forward++);
char backwardChar = clean.charAt(backward--);
if (forwardChar != backwardChar)
return false;
}
return true;
}
From here
In your version you compare first element with last, second with second last etc.
last element in this case is inputString.length()-1(so need to use 'inputString.charAt(val-i-1)' . If you iterate till end, then the count should be equal to length of the string.
for(int i = 0; i<inputString.length(); i++){
if(inputString.charAt(i) == inputString.charAt(val-i-1)){
count ++;
}
}
return (count==val); //true when count=val
Or alternatlively iterate till the mid point of the array, then count value is val/2.
for(int i = 0; i<inputString.length()/2; i++){
if(inputString.charAt(i) == inputString.charAt(val-i-1)){
count ++;
}
}
return (count==val/2); //true when count=val/2
There's no constraints in the question so let me throw in a more cheesy solution.
boolean isPalindrome(String in)
final String inl = in.toLowerCase();
return new StringBuilder(inl).reverse().toString().equals(inl);
}
A palindrome is a word, sentence, verse, or even a number that reads the same forward and backward. In this java solution, we’ll see how to figure out whether the number or the string is palindrome in nature or not.
Method - 1
class Main {
public static void main(String[] args) {
String str = "Nitin", revStr = "";
int strLen = str.length();
for (int i = (strLen - 1); i >=0; --i) {
revStr = revStr + str.charAt(i);
}
if (str.toLowerCase().equals(revStr.toLowerCase())) {
System.out.println(str + " is a Palindrome String.");
}
else {
System.out.println(str + " is not a Palindrome String.");
}
Method - 2
class Main {
public static void main(String[] args) {
int n = 3553, revNum = 0, rem;
// store the number to the original number
int orgNum = n;
/* get the reverse of original number
store it in variable */
while (n != 0) {
remainder = n % 10;
revNum = revNum * 10 + rem;
n /= 10;
}
// check if reversed number and original number are equal
if (orgNum == revNum) {
System.out.println(orgNum + " is Palindrome.");
}
else {
System.out.println(orgNum + " is not Palindrome.");
}

How to sort a array that contains special characters alphabetically?

I am looking for code that produces the following output in standard output from the following string prepared according to a certain format.
Assumptions and rules:
Each letter is used 2 times in the given string and the letters between the same 2 letters are to be considered child letters.
The given string is always given in proper format. The string format
does not need to be checked.
Example:
Input : abccbdeeda
Expected output:
a
--b
----c
--d
----e
Explanation: since the 2 letters "b" occur between the letters "a", the letter b takes 2 hyphens (--b)
Attempt
public static void main(String[] args) {
String input = "abccbdeeda";
System.out.println("input: " + input);
String[] strSplit = input.split("");
String g = "";
String h = "-";
ArrayList<String> list = new ArrayList<String>();
int counter = 1;
boolean secondNumber;
list.add(strSplit[0]);
int dual = 0;
for (int i = 1; i < strSplit.length; i++) {
secondNumber = list.contains(strSplit[i]);
if ((secondNumber)) {
counter--;
dual = counter * 2;
for (int f = 0; f < dual; f++) {
strSplit[i] = h.concat(strSplit[i]);
}
g = "";
dual = 0;
} else {
list.add(strSplit[i]);
counter++;
}
}
Arrays.sort(strSplit);
for (int p = 0; p < strSplit.length; p++) {
System.out.println(strSplit[p]);
}
}
input: abccbdeeda
My output:
----c
----e
--b
--d
a
I wasn't able to sort the output alphabetically. How can I sort alphabetically with those hyphen characters in them?
This task is nicely done with the help of a stack. If the current character is equal to the top of the stack, then the character is closed and can be removed, otherwise we met it for the first time and it must be added to the stack and the resulting string by adding before it stack.size() * 2 dashes.
When we have completely traversed the string we can sort the resulting string.
public static void main(String[] args) {
Stack<Character> stack = new Stack<>();
String string = "abccbdeeda";
StringBuilder result = new StringBuilder();
for(int i = 0; i < string.length(); i++) {
char curChar = string.charAt(i);
if(!stack.isEmpty() && curChar == stack.peek()) {
stack.pop();
} else {
result.append("-".repeat(stack.size() * 2)).append(curChar).append(" ");
stack.add(curChar);
}
}
System.out.println(result);
System.out.println(Arrays.toString(Arrays.stream(result.toString().split(" ")).sorted().toArray()));
}
Output
a --b ----c --d ----e
[----c, ----e, --b, --d, a]
You can go through the strSplit array and extract the charactors in each element to a separate list/array. To check whether the array element contains a letter you can write a regular expression.
Ex: private final Pattern x = Pattern.compile("[a-z]");
Write a separate method to match the patern to each element in the strSplit array. This method will return the charactor in your input string.
private String findCharactor(final StringBuilder element) {
final Matcher matcher = x.matcher(element);
if (matcher.find()) {
final int matchIndex = matcher.start(); //this gives the index of the char in the string
return element.substring(matchIndex);
}
}
Add these returned charactors to a separate array and sort it using sorting function.
Suppose your result list is:
List<String> resultList = Arrays.asList("----c", "----e", "--b", "--d", "a");
You can sort it alphabetically by a single line:
Collections.sort(resultList, (o1, o2) -> new StringBuilder(o1).reverse().toString().compareTo(new StringBuilder(o2).reverse().toString()));
You can use recursion for a depth-first traversal (preorder):
public static String dfs(String string, String prefix) {
if (string.length() == 0) return "";
int i = string.indexOf(string.charAt(0), 1);
return prefix + string.charAt(0) + "\n" // current
+ dfs(string.substring(1, i), prefix + "--") // all nested
+ dfs(string.substring(i + 1), prefix); // all siblings
}
Example call:
public static void main(String[] args) {
System.out.println(dfs("abccbdeeda", ""));
}

Use recursion to find permutations of string using an iterator

As the title implies, I'm having difficulty trying to recursively determine all the permutations of a given String. The catch is that String has to be given through a constructor of an object and then each of the permutations be found one by one. Basically, it has to work like this:
PermutationIterator iter = new PermutationIterator("eat");
while (iter.hasMorePermutations())
System.out.println(iter.nextPermutation());
Here is the code that I'm using but doesn't seem to work and I don't know how to fix it.
public class PermutationIterator {
private String word;
private int pos;
private PermutationIterator tailIterator;
private String currentLetter;
public PermutationIterator(String string) {
word = string;
pos = 0;
currentLetter = string.charAt(pos) + "";
if (string.length() > 1)
tailIterator = new PermutationIterator(string.substring(pos + 1));
}
public String nextPermutation() {
if (word.length() == 1) {
pos++;
return word;
} else if (tailIterator.hasMorePermutations()) {
return currentLetter + tailIterator.nextPermutation();
} else {
pos++;
currentLetter = word.charAt(pos) + "";
String tailString = word.substring(0, pos) + word.substring(pos + 1);
tailIterator = new PermutationIterator(tailString);
return currentLetter + tailIterator.nextPermutation();
}
}
public boolean hasMorePermutations() {
return pos <= word.length() - 1;
}
}
Right now the program prints "eat" and "eta" but after that it through a StringIndexOutOfBounds error off of the second stack. Any help with solving this is much appreciated.
Rather than just supplying the fix let me help diagnose your issue and then you can have a go at fixing it.
If you look carefully at your code you'll see that the hasMorePermutations condition passes when pos == word.length() - 1. That means nextPermutation will be run when pos is pointing to the last character in the string. But in that case when the third branch executes you increment pos and then call word.substring(pos + 1). At that point pos + 1 will be larger than length of the string which will throw the exception.
I expect the fix will be fairly easy.
try this code - generates permutations for any given string
package testing;
import java.util.ArrayList;
import java.util.List;
public class Permutations {
/*
* You will get n! (factorial) - permutations from this
*
* Just like this Example: abc (3! = 6 permutations) [abc acb bac bca cab
* cbc]
*/
static String str = "abcd";
static char[] ch = str.toCharArray();
static List<String> s1 = new ArrayList<>();
static List<String> s2 = new ArrayList<>();
public static void main(String[] args) {
// s1 - list stores initial character from the string
s1.add(String.valueOf(ch[0]));
// recursive loop - char by char
for (int k = 1; k < ch.length; k++) {
// adds char at index 0 for all elements of previous iteration
appendBefore(s1, ch[k]);
// adds char at last index for all elements of previous iteration
appendAfter(s1, ch[k]);
// adds char middle positins like a^b^C - if prev list stores
// elements
// whose size() is 3 - then it would have 2 positions fill
/*
* say d is next char - d should be filled in _^_^_ _ positions are
* previous permuions for 3 chars a,b,c(i.e 6 permutations
*/
appendMiddle(s1, ch[k], k);
// for every iteration first clear s1 - to copy s2, which contains
// previous permutatons
s1.clear();
// now copy s2 to s1- then clear s2
// - this way finally s2 contains all the permutations
for (int x = 0; x < s2.size(); x++) {
s1.add(s2.get(x));
}
System.out.println(s1);
System.out.println(s1.size());
s2.clear();
}
}
private static void appendMiddle(List str, char ch, int positions) {
for (int pos = 1; pos <= positions - 1; pos++) {
for (int i = 0; i < str.size(); i++) {
s2.add(str.get(i).toString().substring(0, pos) + String.valueOf(ch)
+ str.get(i).toString().substring(pos, str.get(i).toString().length()));
}
}
}
private static void appendBefore(List str, char ch) {
for (int i = 0; i < str.size(); i++) {
s2.add(String.valueOf(ch) + str.get(i));
}
}
private static void appendAfter(List str, char ch) {
for (int i = 0; i < str.size(); i++) {
s2.add(str.get(i) + String.valueOf(ch));
}
}
}
do a little change in your hasMorePermutation method as below to solved StringIndexOutOfBounds exception.
public boolean hasMorePermutations()
{
return pos < word.length() - 1;
}

How to search via character matching with a skip distance?

As the title says, I'm working on a project in which I'm searching a given text, moby dick in this case, for a key word. However instead of the word being linear, we are trying to find it via a skip distance ( instead of cat, looking for c---a---t).
I've tried multiple ways, yet can't seem to get it to actually finish one skip distance, have it not work, and call the next allowed distance (incrementing by 1 until a preset limit is reached)
The following is the current method in which this search is done, perhaps this is just something silly that I'm missing?
private int[] search()
throws IOException
{
/*
tlength is the text file length,
plength is the length of the
pattern word (cat in the original post),
text[] is a character array of the text file.
*/
int i=0, j;
int match[] = new int[2];
int skipDist = 2;
while(skipDist <= 100)
{
while(i<=tlength-(plength * skipDist))
{
j=plength-1;
while(j>=0 && pattern[j]==text[i+(j * skipDist)])j--;
if (j<0)
{
match[0] = skipDist;
match[1] = i;
return match;
}
else
{
i++;
}
}
skipDist = skipDist + 1;
}
System.out.println("There was no match!");
System.exit(0);
return match;
}
I do not know about the method you posted, but you can use this instead. I've used string and char array for this:
public boolean checkString (String s)
{
char[] check = {'c','a','t'};
int skipDistance = 2;
for(int i = 0; i< (s.length() - (skipDistance*(check.length-1))); i++)
{
boolean checkValid = true;
for(int j = 0; j<check.length; j++)
{
if(!(s.charAt(i + (j*skipDistance))==check[j]))
{
checkValid = false;
}
}
if(checkValid)
return true;
}
return false;
}
Feed the pattern to match in the char array 'check'.
String "adecrayt" evaluates true. String "cat" evaluates false.
Hope this helps.
[This part was for fixed skip distance]
+++++++++++++++++++++++++++
Now for any skip distance between 2 and 100:
public boolean checkString (String s)
{
char[] check = {'c','a','t'};
int index = 0;
int[] arr = new int[check.length];
for(int i = 0; i< (s.length()); i++)
{
if(check[index]==s.charAt(i))
{
arr[index++] = i;
}
}
boolean flag = true;
if(index==(check.length))
{
for(int i = 0; i<arr.length-1; i++)
{
int skip = arr[i+1]-arr[i];
if(!((skip>2)&&(skip<100)))
{
flag = false;
}
else
{
System.out.println("Skip Distance : "+skip);
}
}
}
else
{
flag = false;
}
return flag;
}
If you pass in a String, you only need one line:
public static String search(String s, int skipDist) {
return s.replaceAll(".*(c.{2," + skipDist + "}a.{2," + skipDist + "}t)?.*", "$1");
}
If no match found, a blank will be returned.

Find Shortest Part of Sentence containing given words

Ex:
if there is a sentence given:
My name is not eugene. my pet name is not eugene.
And we have to search the smallest part in the sentence that Contains the given words
my and eugene
then the answer will be
eugene. my.
No need to check the uppercase or lowercase or special charaters or numerics.
I have pasted my code but getting wrong answer for some test cases.
can any one have any idea what is the problem with the code . I don't have the test case for which it is wrong.
import java.io.*;
import java.util.*;
public class ShortestSegment
{
static String[] pas;
static String[] words;
static int k,st,en,fst,fen,match,d;
static boolean found=false;
static int[] loc;
static boolean[] matches ;
public static void main(String s[]) throws IOException
{
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
pas = in.readLine().replaceAll("[^A-Za-z ]", "").split(" ");
k = Integer.parseInt(in.readLine());
words = new String[k];
matches = new boolean[k];
loc = new int[k];
for(int i=0;i<k;i++)
{
words[i] = in.readLine();
}
en = fen = pas.length;
find(0);
if(found==false)
System.out.println("NO SUBSEGMENT FOUND");
else
{
for(int j=fst;j<=fen;j++)
System.out.print(pas[j]+" ");
}
}
private static void find(int min)
{
if(min==pas.length)
return;
for(int i=0;i<k;i++)
{
if(pas[min].equalsIgnoreCase(words[i]))
{
if(matches[i]==false)
{
loc[i]=min;
matches[i] =true;
match++;
}
else
{
loc[i]=min;
}
if(match==k)
{
en=min;
st = min();
found=true;
if((fen-fst)>(en-st))
{
fen=en;
fst=st;
}
match--;
matches[getIdx()]=false;
}
}
}
find(min+1);
}
private static int getIdx()
{
for(int i=0;i<k;i++)
{
if(words[i].equalsIgnoreCase(pas[st]))
return i;
}
return -1;
}
private static int min()
{
int min=loc[0];
for(int i=1;i<loc.length;i++)
if(min>loc[i])
min=loc[i];
return min;
}
}
The code you've given will produce incorrect output for the following input. I'm assuming, the word length also matters when you want to 'Find Shortest Part of Sentence containing given words'
String: 'My firstname is eugene. My fn is eugene.'
Number of search strings: 2
string1: 'my'
string2: 'is'
Your solution is: 'My firstname is'
The correct answer is: 'My fn is'
The problem in your code is, it considers both 'firstname' and 'fn' as same length. In the comparison (fen-fst)>(en-st) you're only considering whether the number of words has minimized and not whether the word lengths has shortened.
the following codes (junit):
#Test
public void testIt() {
final String s = "My name is not eugene. my pet name is not eugene.";
final String tmp = s.toLowerCase().replaceAll("[^a-zA-Z]", " ");//here we need the placeholder (blank)
final String w1 = "my "; // leave a blank at the end to avoid those words e.g. "myself", "myth"..
final String w2 = "eugene ";//same as above
final List<Integer> l1 = getList(tmp, w1); //indexes list
final List<Integer> l2 = getList(tmp, w2);
int min = Integer.MAX_VALUE;
final int[] idx = new int[] { 0, 0 };
//loop to find out the result
for (final int i : l1) {
for (final int j : l2) {
if (Math.abs(j - i) < min) {
final int x = j - i;
min = Math.abs(j - i);
idx[0] = j - i > 0 ? i : j;
idx[1] = j - i > 0 ? j + w2.length() + 2 : i + w1.length() + 2;
}
}
}
System.out.println("indexes: " + Arrays.toString(idx));
System.out.println("result: " + s.substring(idx[0], idx[1]));
}
private List<Integer> getList(final String input, final String search) {
String t = new String(input);
final List<Integer> list = new ArrayList<Integer>();
int tmp = 0;
while (t.length() > 0) {
final int x = t.indexOf(search);
if (x < 0 || x > t.length()) {
break;
}
tmp += x;
list.add(tmp);
t = t.substring(search.length() + x);
}
return list;
}
give output:
indexes: [15, 25]
result: eugene. my
I think the codes with inline comments are pretty easy to understand. basically, playing with index+wordlength.
Note
the "Not Found" case is not implemented.
codes are just showing the
idea, it can be optimized. e.g. at least one abs() could be saved.
etc...
hope it helps.
I think it can be handled in another way :
First , find a matching result , and minimize the bound to the current result and then find a matching result from the current result .It can be coded as follows:
/**This method intends to check the shortest interval between two words
* #param s : the string to be processed at
* #param first : one of the words
* #param second : one of the words
*/
public static void getShortestInterval(String s , String first , String second)
{
String situationOne = first + "(.*?)" + second;
String situationTwo = second + "(.*?)" + first;
Pattern patternOne = Pattern.compile(situationOne,Pattern.DOTALL|Pattern.CASE_INSENSITIVE);
Pattern patternTwo = Pattern.compile(situationTwo,Pattern.DOTALL|Pattern.CASE_INSENSITIVE);
List<Integer> result = new ArrayList<Integer>(Arrays.asList(Integer.MAX_VALUE,-1,-1));
/**first , test the first choice*/
Matcher matcherOne = patternOne.matcher(s);
findTheMax(first.length(),matcherOne, result);
/**then , test the second choice*/
Matcher matcherTwo = patternTwo.matcher(s);
findTheMax(second.length(),matcherTwo,result);
if(result.get(0)!=Integer.MAX_VALUE)
{
System.out.println("The shortest length is " + result.get(0));
System.out.println("Which start # " + result.get(1));
System.out.println("And end # " + result.get(2));
}else
System.out.println("No matching result is found!");
}
private static void findTheMax(int headLength , Matcher matcher , List<Integer> result)
{
int length = result.get(0);
int startIndex = result.get(1);
int endIndex = result.get(2);
while(matcher.find())
{
int temp = matcher.group(1).length();
int start = matcher.start();
List<Integer> minimize = new ArrayList<Integer>(Arrays.asList(Integer.MAX_VALUE,-1,-1));
System.out.println(matcher.group().substring(headLength));
findTheMax(headLength, matcher.pattern().matcher(matcher.group().substring(headLength)), minimize);
if(minimize.get(0) != Integer.MAX_VALUE)
{
start = start + minimize.get(1) + headLength;
temp = minimize.get(0);
}
if(temp<length)
{
length = temp;
startIndex = start;
endIndex = matcher.end();
}
}
result.set(0, length);
result.set(1, startIndex);
result.set(2, endIndex);
}
Note that this can handle two situations , regardless of the sequence of the two words!
you can use Knuth Morris Pratt algorithm to find indexes of all occurrences of every given word in your text. Imagine you have text of length N and M words (w1 ... wM). Using KMP algorithm you can get array:
occur = string[N];
occur[i] = 1, if w1 starts at position i
...
occur[i] = M, if wM starts at position i
occur[i] = 0, if no word from w1...wM starts at position i
you loop through this array and from every non-zero position search forward for other M-1 words.
This is approximate pseudocode. Just to understand the idea. It definitely won't work if you just recode it on java:
for i=0 to N-1 {
if occur[i] != 0 {
for j = i + w[occur[i] - 1].length - 1 { // searching forward
if occur[j] != 0 and !foundWords.contains(occur[j]) {
foundWords.add(occur[j]);
lastWordInd = j;
if foundWords.containAllWords() break;
}
foundTextPeaceLen = j + w[occur[lastWordInd]].length - i;
if foundTextPeaceLen < minTextPeaceLen {
minTextPeaceLen = foundTextPeaceLen;
// also remember start and end indexes of text peace
}
}
}
}

Categories

Resources