How to find repetitive patterns in char array? - java

I have an array of characters:
a b c x y d e f x y a b c t r e a b c
How can I find repetitive patterns of sizes 2 onwards?
The array needs to be traversed from the end. In the example I need to find patterns b c, a b, x y and patterns of sizes 3: a b c and x y z. Along with the indices of matching chars.
So far I have tried to traverse the array backwards and find patterns:
for (int size = 2; size < aLT.size(); size++) {
for (int i = aLT.size() - 1; i >= 0; i--) {
// code here
}
}

This will do the job, you can change the patternSize variable to whatever value you want (well lesser than the size of the input string) :
It takes advantage of the String#contains() method looking for sub sequences of the first String.
public static void main(String[] args) {
int patternSize=4;
String input = "abcxydefxyabctreabcabcx";
Set<String> patterns = new TreeSet<String>();
// test size n patterns
for (int i=0; i<input.length()-patternSize; i++){
String pattern = (String) input.subSequence(i, i+patternSize);
String tester="";
if (i>0 && i<input.length()-patternSize-1)
tester = input.substring(0,i)+input.substring(i+patternSize);
else if (i==0)
tester = input.substring(i+patternSize);
else if (i==input.length()-patternSize-1)
tester = input.substring(0,i);
if (tester.contains(pattern)){
patterns.add(pattern);
}
}
System.out.println("Size "+patternSize+" patterns finder");
for(String aPattern : patterns){
System.out.println("The pattern "+aPattern+" was found several times");
}
}

int n = 2; // in your case 2 and 3
Map<String, List<Integer>> matches = new HashMap<String, List<Integer>>();
String charsString = new String( chars );
String current = null;
String rest = null;
for( int i = chars.length - n; i >= 0; i-- ) {
current = charsString.substring( i, i + n );
rest = charsString.substring( 0, i );
int index = rest.indexOf( current );
if( index > -1 ) {
if( matches.containsKey( current ) ) {
continue;
}
List<Integer> indices = new ArrayList<Integer>();
indices.add( i );
while( index > -1 ) {
indices.add( index );
index = rest.indexOf( current, index + 1 );
}
matches.put( current, indices );
}
}
// print the results
for( Entry<String, List<Integer>> match : matches.entrySet() ) {
System.out.println( match.getKey() + " with indices: " + match.getValue() );
}
And the output is:
ab with indices: [16, 0, 10]
bc with indices: [17, 1, 11]
xy with indices: [8, 3]

Here is a method that does what you want to do. All you have to do if you want tofind patterns of different sizes if change patternSize and the strings that are added to the set. At the moment I have it returning a count of the amount of matches but you can easily modify it to return something else such as indices of where the matches begin or a Boolean of whether there are any matches or not.
public static int findPatterns(char[] charArray) {
int patternSize = 2;
Set<String> patterns = new HashSet<>();
patterns.add("bc");
patterns.add("ab");
patterns.add("xy");
int count = 0;
if (charArray.length < patternSize) {
return 0;
}
for (int i = 0; i < charArray.length - patternSize + 1; i++) {
String pattern = "";
for (int j = i; j < i + patternSize; j++) {
pattern += charArray[j];
}
if (patterns.contains(pattern)) {
count++;
}
}
return count;
}

Related

Will be Array of ArrayList or Double dimension array be the better data structure for this java translation program

I'm writing some research Java code to manipulate with the strings and stuck,
need some help with ideas and proper data structure and algorithm for the problem below.
I have created some simplified version of the problem as separate java class in the code below.
Need to translate input string s_in
letter by letter to return String[] array R with
all possible translations.
Each letter could be translated by translate_letter(),
which takes letter in, and output String[] array out.
The complexity is, that size of out array could be
anything from 1 to 4.
I have created initial information gathering and data structure
as arrays, list of arrays and double dimension array.
Couldn't proceed any further, because cannot come up with idea,
how to actually produce final output array R[] -
need a data structure and algorithm to process translation
dynamically regardless of quantity of letters in input string
(it will be anything from 1 to 10 in practice).
Test cases:
1) If s_in = "a"
R[] will be just R[] = { "a1" }
2) If s_in = "ab"
R[] = { "a1b1", "a1b2" }
3) If s_in = "abc"
R[] = { "a1b1c1", "a1b1c2", "a1b1c3", "a1b2c1", "a1b2c2", "a1b2c3" }
4) If s_in = "db"
R[] = { "d1b1", "d2b1", "d3b1", "d4b1" }
This is my final version of the code:
import java.util.*;
public class translate_string {
//------------------------------------------------------------
/*
This is simplified version to set-up proper data structure
and algorythm.
Need to translate input string s_in
letter by letter to return String[] array R with
all possible translations.
*/
public static void main (String args[]) throws Exception {
// input string to translate
String s_in = "a" + "b"; //
// split String s_in into String[] array of each letter
String[] lt = s_in.split("");
// len of lt array (how many letters)
int lt_len = lt.length;
// array - how many combo for each letter
int lt_cmb[] = new int[lt_len];
// https://stackoverflow.com/questions/3642205/java-arraylist-of-arrays
//Declaration of an ArrayList of String Arrays
ArrayList<String[]> lofAL = new ArrayList<String[]>();
// storage for temp array of returned translations
String[] array;
// total qty of combo from s_in: N = lt_cmb[0]*lt_cmb[1]*lt_cmb[2]...
int N = 1;
// max length of returned translations
int max = 0;
int i = 0; // loop vars
int j = 0;
int k = 0;
for (i = 0; i < lt_len; i++ ) { // for i
// lt[0] = a lt[1] = b - each letter from ori s_in
// ori lt[] array of letters
// in array of String[] - all combo for letter lt[i]
array = translate_letter(lt[i]);
// add each array as next element of ArrayList lofAL
lofAL.add(array);
// add len of each array (letters) into lt_cmb
lt_cmb[i] = array.length;
if ( lt_cmb[i] > max ) { max = lt_cmb[i]; }
// multiply total qty of combo
N = N * lt_cmb[i];
System.out.println("lt[" + i + "]=" + lt[i] + " lt_cmb[" + i + "]=" + lt_cmb[i]);
} // end for i
System.out.println("N=" + N + " lt_len=" + lt_len + " lofAL.size()=" + lofAL.size() + " max=" + max);
// lt_len must be = lofAL.size()
// 2-dim matrix array of letters
String[][] F = new String[lofAL.size()][max];
for (i = 0; i < lofAL.size(); i++ ) {
for (j = 0; j < max; j++ ) {
F[i][j] = ""; // ini
}
}
// final ready array R
String[] R = new String[N];
for (i = 0; i < N; i++ ) {
R[i] = "";
}
int count = 0;
for( i = 0; i < lofAL.size(); i++ ) { // for i 2
count = 0;
for( j = 0; j < lofAL.get(i).length; j++ ) {
F[i][j] = F[i][j] + lofAL.get(i)[j];
/*
if ( count == 0 ) {
R[i] = R[i] + lofAL.get(i)[j];
k = k + 1;
count = 1;
}
if ( count == 1 ) {
R[i] = R[i] + lofAL.get(i)[j];
count = 2;
}
*/
// Printing out the ArrayList Contents of String Arrays
// '$' is used to indicate the String elements of String Arrays
System.out.printf(" $ " + lofAL.get(i)[j]);
}
System.out.println("---------------");
} // end for i 2
for ( i = 0; i < lt_len; i++ ) { // for i3
for ( j = 0; j < max; j++ ) {
System.out.println("F[" + i + "][" + j + "]=" + F[i][j]);
}
} // end for i3
for ( k = 0; k < N; k++ ) { // for k
j = 0;
for ( i = 0; i < lt_len; i++ ) { // for i
if ( lt_cmb[0] == 1 ) { j = 0; }
R[k] = R[k] + F[i][j];
} // end for i
} // end for k
System.out.println("================");
for (i = 0; i < N; i++ ) {
System.out.println("R[" + i + "]=" + R[i]);
}
} // end main()
//------------------------------------------------------------
// translate letters - return array of diff combinations
public static String[] translate_letter(String s) {
ArrayList<String> o = new ArrayList<>(1);
if ( s.equals("a") ) { // if a
o.add("a1");
} else {
if ( s.equals("b") ) { // if b
o.add("b1"); o.add("b2");
} else {
if ( s.equals("c") ) { // if c
o.add("c1"); o.add("c2"); o.add("c3");
} else {
if ( s.equals("d") ) { // if d
o.add("d1"); o.add("d2"); o.add("d3"); o.add("d4");
} else {
o.add(s); // s = def add (if nothing above matches)
} // end if d
} // end if c
} // end if b
} // end if a
//Convert ArrayList o to string array[]
String[] array = o.toArray(new String[o.size()]);
return array;
} // end translate_letter()
//------------------------------------------------------------
} // end class()
//---------------------------------

Most efficient way to search for unknown patterns in a string?

I am trying to find patterns that:
occur more than once
are more than 1 character long
are not substrings of any other known pattern
without knowing any of the patterns that might occur.
For example:
The string "the boy fell by the bell" would return 'ell', 'the b', 'y '.
The string "the boy fell by the bell, the boy fell by the bell" would return 'the boy fell by the bell'.
Using double for-loops, it can be brute forced very inefficiently:
ArrayList<String> patternsList = new ArrayList<>();
int length = string.length();
for (int i = 0; i < length; i++) {
int limit = (length - i) / 2;
for (int j = limit; j >= 1; j--) {
int candidateEndIndex = i + j;
String candidate = string.substring(i, candidateEndIndex);
if(candidate.length() <= 1) {
continue;
}
if (string.substring(candidateEndIndex).contains(candidate)) {
boolean notASubpattern = true;
for (String pattern : patternsList) {
if (pattern.contains(candidate)) {
notASubpattern = false;
break;
}
}
if (notASubpattern) {
patternsList.add(candidate);
}
}
}
}
However, this is incredibly slow when searching large strings with tons of patterns.
You can build a suffix tree for your string in linear time:
https://en.wikipedia.org/wiki/Suffix_tree
The patterns you are looking for are the strings corresponding to internal nodes that have only leaf children.
You could use n-grams to find patterns in a string. It would take O(n) time to scan the string for n-grams. When you find a substring by using a n-gram, put it into a hash table with a count of how many times that substring was found in the string. When you're done searching for n-grams in the string, search the hash table for counts greater than 1 to find recurring patterns in the string.
For example, in the string "the boy fell by the bell, the boy fell by the bell" using a 6-gram will find the substring "the boy fell by the bell". A hash table entry with that substring will have a count of 2 because it occurred twice in the string. Varying the number of words in the n-gram will help you discover different patterns in the string.
Dictionary<string, int>dict = new Dictionary<string, int>();
int count = 0;
int ngramcount = 6;
string substring = "";
// Add entries to the hash table
while (count < str.length) {
// copy the words into the substring
int i = 0;
substring = "";
while (ngramcount > 0 && count < str.length) {
substring[i] = str[count];
if (str[i] == ' ')
ngramcount--;
i++;
count++;
}
ngramcount = 6;
substring.Trim(); // get rid of the last blank in the substring
// Update the dictionary (hash table) with the substring
if (dict.Contains(substring)) { // substring is already in hash table so increment the count
int hashCount = dict[substring];
hashCount++;
dict[substring] = hashCount;
}
else
dict[substring] = 1;
}
// Find the most commonly occurrring pattern in the string
// by searching the hash table for the greatest count.
int maxCount = 0;
string mostCommonPattern = "";
foreach (KeyValuePair<string, int> pair in dict) {
if (pair.Value > maxCount) {
maxCount = pair.Value;
mostCommonPattern = pair.Key;
}
}
I've written this just for fun. I hope I have understood the problem correctly, this is valid and fast enough; if not, please be easy on me :) I might optimize it a little more I guess, if someone finds it useful.
private static IEnumerable<string> getPatterns(string txt)
{
char[] arr = txt.ToArray();
BitArray ba = new BitArray(arr.Length);
for (int shingle = getMaxShingleSize(arr); shingle >= 2; shingle--)
{
char[] arr1 = new char[shingle];
int[] indexes = new int[shingle];
HashSet<int> hs = new HashSet<int>();
Dictionary<int, int[]> dic = new Dictionary<int, int[]>();
for (int i = 0, count = arr.Length - shingle; i <= count; i++)
{
for (int j = 0; j < shingle; j++)
{
int index = i + j;
arr1[j] = arr[index];
indexes[j] = index;
}
int h = getHashCode(arr1);
if (hs.Add(h))
{
int[] indexes1 = new int[indexes.Length];
Buffer.BlockCopy(indexes, 0, indexes1, 0, indexes.Length * sizeof(int));
dic.Add(h, indexes1);
}
else
{
bool exists = false;
foreach (int index in indexes)
if (ba.Get(index))
{
exists = true;
break;
}
if (!exists)
{
int[] indexes1 = dic[h];
if (indexes1 != null)
foreach (int index in indexes1)
if (ba.Get(index))
{
exists = true;
break;
}
}
if (!exists)
{
foreach (int index in indexes)
ba.Set(index, true);
int[] indexes1 = dic[h];
if (indexes1 != null)
foreach (int index in indexes1)
ba.Set(index, true);
dic[h] = null;
yield return new string(arr1);
}
}
}
}
}
private static int getMaxShingleSize(char[] arr)
{
for (int shingle = 2; shingle <= arr.Length / 2 + 1; shingle++)
{
char[] arr1 = new char[shingle];
HashSet<int> hs = new HashSet<int>();
bool noPattern = true;
for (int i = 0, count = arr.Length - shingle; i <= count; i++)
{
for (int j = 0; j < shingle; j++)
arr1[j] = arr[i + j];
int h = getHashCode(arr1);
if (!hs.Add(h))
{
noPattern = false;
break;
}
}
if (noPattern)
return shingle - 1;
}
return -1;
}
private static int getHashCode(char[] arr)
{
unchecked
{
int hash = (int)2166136261;
foreach (char c in arr)
hash = (hash * 16777619) ^ c.GetHashCode();
return hash;
}
}
Edit
My previous code has serious problems. This one is better:
private static IEnumerable<string> getPatterns(string txt)
{
Dictionary<int, int> dicIndexSize = new Dictionary<int, int>();
for (int shingle = 2, count0 = txt.Length / 2 + 1; shingle <= count0; shingle++)
{
Dictionary<string, int> dic = new Dictionary<string, int>();
bool patternExists = false;
for (int i = 0, count = txt.Length - shingle; i <= count; i++)
{
string sub = txt.Substring(i, shingle);
if (!dic.ContainsKey(sub))
dic.Add(sub, i);
else
{
patternExists = true;
int index0 = dic[sub];
if (index0 >= 0)
{
dicIndexSize[index0] = shingle;
dic[sub] = -1;
}
}
}
if (!patternExists)
break;
}
List<int> lst = dicIndexSize.Keys.ToList();
lst.Sort((a, b) => dicIndexSize[b].CompareTo(dicIndexSize[a]));
BitArray ba = new BitArray(txt.Length);
foreach (int i in lst)
{
bool ok = true;
int len = dicIndexSize[i];
for (int j = i, max = i + len; j < max; j++)
{
if (ok) ok = !ba.Get(j);
ba.Set(j, true);
}
if (ok)
yield return txt.Substring(i, len);
}
}
Text in this book took 3.4sec in my computer.
Suffix arrays are the right idea, but there's a non-trivial piece missing, namely, identifying what are known in the literature as "supermaximal repeats". Here's a GitHub repo with working code: https://github.com/eisenstatdavid/commonsub . Suffix array construction uses the SAIS library, vendored in as a submodule. The supermaximal repeats are found using a corrected version of the pseudocode from findsmaxr in Efficient repeat finding via suffix arrays
(Becher–Deymonnaz–Heiber).
static void FindRepeatedStrings(void) {
// findsmaxr from https://arxiv.org/pdf/1304.0528.pdf
printf("[");
bool needComma = false;
int up = -1;
for (int i = 1; i < Len; i++) {
if (LongCommPre[i - 1] < LongCommPre[i]) {
up = i;
continue;
}
if (LongCommPre[i - 1] == LongCommPre[i] || up < 0) continue;
for (int k = up - 1; k < i; k++) {
if (SufArr[k] == 0) continue;
unsigned char c = Buf[SufArr[k] - 1];
if (Set[c] == i) goto skip;
Set[c] = i;
}
if (needComma) {
printf("\n,");
}
printf("\"");
for (int j = 0; j < LongCommPre[up]; j++) {
unsigned char c = Buf[SufArr[up] + j];
if (iscntrl(c)) {
printf("\\u%.4x", c);
} else if (c == '\"' || c == '\\') {
printf("\\%c", c);
} else {
printf("%c", c);
}
}
printf("\"");
needComma = true;
skip:
up = -1;
}
printf("\n]\n");
}
Here's a sample output on the text of the first paragraph:
Davids-MBP:commonsub eisen$ ./repsub input
["\u000a"
," S"
," as "
," co"
," ide"
," in "
," li"
," n"
," p"
," the "
," us"
," ve"
," w"
,"\""
,"–"
,"("
,")"
,". "
,"0"
,"He"
,"Suffix array"
,"`"
,"a su"
,"at "
,"code"
,"com"
,"ct"
,"do"
,"e f"
,"ec"
,"ed "
,"ei"
,"ent"
,"ere's a "
,"find"
,"her"
,"https://"
,"ib"
,"ie"
,"ing "
,"ion "
,"is"
,"ith"
,"iv"
,"k"
,"mon"
,"na"
,"no"
,"nst"
,"ons"
,"or"
,"pdf"
,"ri"
,"s are "
,"se"
,"sing"
,"sub"
,"supermaximal repeats"
,"te"
,"ti"
,"tr"
,"ub "
,"uffix arrays"
,"via"
,"y, "
]
I would use Knuth–Morris–Pratt algorithm (linear time complexity O(n)) to find substrings. I would try to find the largest substring pattern, remove it from the input string and try to find the second largest and so on. I would do something like this:
string pattern = input.substring(0,lenght/2);
string toMatchString = input.substring(pattern.length, input.lenght - 1);
List<string> matches = new List<string>();
while(pattern.lenght > 0)
{
int index = KMP(pattern, toMatchString);
if(index > 0)
{
matches.Add(pattern);
// remove the matched pattern occurences from the input string
// I would do something like this:
// 0 to pattern.lenght gets removed
// check for all occurences of pattern in toMatchString and remove them
// get the remaing shrinked input, reassign values for pattern & toMatchString
// keep looking for the next largest substring
}
else
{
pattern = input.substring(0, pattern.lenght - 1);
toMatchString = input.substring(pattern.length, input.lenght - 1);
}
}
Where KMP implements Knuth–Morris–Pratt algorithm. You can find the Java implementations of it at Github or Princeton or write it yourself.
PS: I don't code in Java and it is quick try to my first bounty about to close soon. So please don't give me the stick if I missed something trivial or made a +/-1 error.

How to improve the algorithm for the Max Common Array Slice?

I was asked to take a HackerRank code test, and the exercise I was asked is the Max Common Array Slice. The problem goes as follows:
You are given a sequence of n integers a0, a1, . . . , an−1 and the
task is to find the maximum slice of the array which contains no more
than two different numbers.
Input 1 :
[1, 1, 1, 2, 2, 2, 1, 1, 2, 2, 6, 2, 1, 8]
Result 1 : Answer is 10 because the array slice of (0, 9) is the
largest slice of the array with no more than two different numbers.
There are 10 items in this slice which are "1, 1, 1, 2, 2, 2, 1, 1, 2, 2".
2 different numbers for this slice are 1 and 2.
Input 2:
[53, 800, 0, 0, 0, 356, 8988, 1, 1]
Result 2: Answer is 4 because the slice of (1, 4) is the largest slice
of the array with no more than two different numbers. The slice (2, 5)
would also be valid and would still give a result of 4.
There are 4 items in this slice which are "800,0,0,0".
2 different numbers for this slice are 800 and 0.
Maximum common array slice of the array which contains no more than
two different numbers implementation in Java takes a comma delimited
array of numbers from STDIN and the output is written back to console.
The implementation I provide (below) works, however 3 test cases timeout in HR. Clearly, HR hides the test cases, so I could not see exactly the conditions the timeout was triggered or even the length of the timeout.
I'm not surprised of the timeout, though, given the asymptotic complexity of my solution. But my question is: how could my solution be improved?
Thanks in advance to all those that will help!
import java.io.*;
import java.lang.*;
import java.util.*;
import java.util.stream.*;
public class Solution {
public static void main(String args[] ) throws Exception {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String line = br.readLine();
List<Integer> inputSequence = parseIntegerSequence(line);
int largestSliceSize = calculateLargestSlice(inputSequence);
System.out.println(largestSliceSize);
}
private static List<Integer> parseIntegerSequence(String input) {
if (input == null)
return new ArrayList();
return Arrays.asList(input.split("\\s*,\\s*"))
.stream()
.filter(item -> item.matches("^\\s*-?[0-9]+\\s*$"))
.map(item -> Integer.parseInt(item))
.collect(Collectors.toList());
}
private static int calculateLargestSlice(List<Integer> inputSequence) {
Map<Integer, Integer> temporaryMap = new HashMap<>();
int result = 0;
int startIndex = 0;
int uniqueItemCount = 0;
Integer[] array = inputSequence.toArray(new Integer[inputSequence.size()]);
while (startIndex < array.length) { // loop over the entire input sequence
temporaryMap.put(array[startIndex], startIndex);
uniqueItemCount++;
for (int j = startIndex + 1; j < array.length; j++) {
if (temporaryMap.get(array[j]) == null) {
if (uniqueItemCount != 2) {
temporaryMap.put(array[j], j);
uniqueItemCount++;
if (j == array.length - 1) {
result = Math.max(result, j - startIndex + 1);
startIndex = array.length;
break;
}
} else {
result = Math.max(result, j - startIndex);
int item = array[j-1];
int firstFoundIndex = 0;
for( int k=j-1; k>=0; k-- )
{
if( array[k] != item )
{
firstFoundIndex = k+1;
break;
}
}
startIndex = firstFoundIndex;
temporaryMap.clear();
uniqueItemCount = 0;
break;
}
} else if (temporaryMap.get(array[j]) != null) {
if (j == array.length - 1) {
result = Math.max(result, j - startIndex + 1);
startIndex = array.length;
break;
}
}
}
}
return result;
}
}
This is my answer in Java and it passed all the HackerRank test cases. Please feel free to comment if you find something wrong.
public static int maxCommonArraySlice(List<Integer> inputSequence) {
if(inputSequence.size() < 2) return inputSequence.size(); // I'm doubting this line should be <= 2
List<Integer> twoInts = new LinkedList<>();
twoInts.add(inputSequence.get(0));
int start = 0;
int end = inputSequence.size();
int count = 0;
int max_length = 0;
while(start < end) {
if(twoInts.contains(inputSequence.get(start))) {
count++;
start++;
}
else if(twoInts.size() == 1) {
twoInts.add(inputSequence.get(start));
}
else { // twoInts.size() is 2
count = 0;
start--;
twoInts.set(0, inputSequence.get(start));
twoInts.set(1, inputSequence.get(start + 1));
}
if(count > max_length) {
max_length = count;
}
}
return max_length;
}
public static void main(String[] args) {
List<Integer> input = new LinkedList<Integer>(Arrays.asList(53,800,0,0,0,356,8988,1,1));
System.out.println(maxCommonArraySlice(input));
}
I think this would work:
public int solution(int[] arr) {
int lastSeen = -1;
int secondLastSeen = -1;
int lbs = 0;
int tempCount = 0;
int lastSeenNumberRepeatedCount = 0;
for (int current : arr) {
if (current == lastSeen || current == secondLastSeen) {
tempCount ++;
} else {
// if the current number is not in our read list it means new series has started, tempCounter value in this case will be
// how many times lastSeen number repeated before this new number encountered + 1 for current number.
tempCount = lastSeenNumberRepeatedCount + 1;
}
if (current == lastSeen) {
lastSeenNumberRepeatedCount++;
} else {
lastSeenNumberRepeatedCount = 1;
secondLastSeen = lastSeen;
lastSeen = current;
}
lbs = Math.max(tempCount, lbs);
}
return lbs;
}
Reference
This is a python solution, as per requested by OP
def solution(arr):
if (len(arr) <= 2): print arr
lres = 0
rres = 0
l = 0
r = 1
last = arr[1]
prev = arr[0]
while(r <= len(arr)-1):
if prev != last:
if arr[r] == prev:
prev = last
last = arr[r]
elif arr[r] != last:
l = r-1
while(arr[l-1] == last):
l -= 1
last = arr[r]
prev = arr[r-1]
else:
if arr[r] != prev:
last = arr[r]
if r - l > rres-lres:
rres = r
lres = l
r += 1
print arr[lres:rres+1]
For current segment let's say that last is the last value added and prev is the second distinct value in the segment. (initially they might be equal).
Let's keep to pointers l and r to left and right ends of the current segment with at most two distinct elements. And let's say we consider element arr[r].
If current segment [l,r-1] contains only one distinct element, we can safely add arr[r], with possibly updating last and prev.
Now if arr[r] equals to last, then we don't need to do anything. If arr[r] equals to prev, we need to swap prev and last. If it equals to neither of those two, we need to update l left pointer, by tracing back from r-1 until we find an element which is not equal to last, then update last and prev.

Finding shortest possible substring that contains a String

This was a question asked in a recent programming interview.
Given a random string S and another string T with unique elements, find the minimum consecutive sub-string of S such that it contains all the elements in T.
Say,
S='adobecodebanc'
T='abc'
Answer='banc'
I've come up with a solution,
public static String completeSubstring(String T, String S){
String minSub = T;
StringBuilder sb = new StringBuilder();
for (int i = 0; i <T.length()-1; i++) {
for (int j = i + 1; j <= T.length() ; j++) {
String sub = T.substring(i,j);
if(stringContains(sub, S)){
if(sub.length() < minSub.length()) minSub = sub;
}
}
}
return minSub;
}
private static boolean stringContains(String t, String s){
//if(t.length() <= s.length()) return false;
int[] arr = new int[256];
for (int i = 0; i <t.length() ; i++) {
char c = t.charAt(i);
arr[c -'a'] = 1;
}
boolean found = true;
for (int i = 0; i <s.length() ; i++) {
char c = s.charAt(i);
if(arr[c - 'a'] != 1){
found = false;
break;
}else continue;
}
return found;
}
This algorithm has a O(n3) complexity, which but naturally isn't great. Can someone suggest a better algorithm.
Here's the O(N) solution.
The important thing to note re: complexity is that each unit of work involves incrementing either start or end, they don't decrease, and the algorithm stops before they both get to the end.
public static String findSubString(String s, String t)
{
//algorithm moves a sliding "current substring" through s
//in this map, we keep track of the number of occurrences of
//each target character there are in the current substring
Map<Character,int[]> counts = new HashMap<>();
for (char c : t.toCharArray())
{
counts.put(c,new int[1]);
}
//how many target characters are missing from the current substring
//current substring is initially empty, so all of them
int missing = counts.size();
//don't waste my time
if (missing<1)
{
return "";
}
//best substring found
int bestStart = -1, bestEnd = -1;
//current substring
int start=0, end=0;
while (end<s.length())
{
//expand the current substring at the end
int[] cnt = counts.get(s.charAt(end++));
if (cnt!=null)
{
if (cnt[0]==0)
{
--missing;
}
cnt[0]+=1;
}
//while the current substring is valid, remove characters
//at the start to see if a shorter substring that ends at the
//same place is also valid
while(start<end && missing<=0)
{
//current substring is valid
if (end-start < bestEnd-bestStart || bestEnd<0)
{
bestStart = start;
bestEnd = end;
}
cnt = counts.get(s.charAt(start++));
if (cnt != null)
{
cnt[0]-=1;
if (cnt[0]==0)
{
++missing;
}
}
}
//current substring is no longer valid. we'll add characters
//at the end until we get another valid one
//note that we don't need to add back any start character that
//we just removed, since we already tried the shortest valid string
//that starts at start-1
}
return(bestStart<=bestEnd ? s.substring(bestStart,bestEnd) : null);
}
I know that there already is an adequate O(N) complexity answer, but I tried to figure it out on my own without looking it up, just because it's a fun problem to solve and thought I would share. Here's the O(N) solution that I came up with:
public static String completeSubstring(String S, String T){
int min = S.length()+1, index1 = -1, index2 = -1;
ArrayList<ArrayList<Integer>> index = new ArrayList<ArrayList<Integer>>();
HashSet<Character> targetChars = new HashSet<Character>();
for(char c : T.toCharArray()) targetChars.add(c);
//reduce initial sequence to only target chars and keep track of index
//Note that the resultant string does not allow the same char to be consecutive
StringBuilder filterS = new StringBuilder();
for(int i = 0, s = 0 ; i < S.length() ; i++) {
char c = S.charAt(i);
if(targetChars.contains(c)) {
if(s > 0 && filterS.charAt(s-1) == c) {
index.get(s-1).add(i);
} else {
filterS.append(c);
index.add(new ArrayList<Integer>());
index.get(s).add(i);
s++;
}
}
}
//Not necessary to use regex, loops are fine, but for readability sake
String regex = "([abc])((?!\\1)[abc])((?!\\1)(?!\\2)[abc])";
Matcher m = Pattern.compile(regex).matcher(filterS.toString());
for(int i = 0, start = -1, p1, p2, tempMin, charSize = targetChars.size() ; m.find(i) ; i = start+1) {
start = m.start();
ArrayList<Integer> first = index.get(start);
p1 = first.get(first.size()-1);
p2 = index.get(start+charSize-1).get(0);
tempMin = p2-p1;
if(tempMin < min) {
min = tempMin;
index1 = p1;
index2 = p2;
}
}
return S.substring(index1, index2+1);
}
I'm pretty sure the complexity is O(N), please correct if I'm wrong
Alternative implementation of O(N) algorithm proposed by #MattTimmermans, which uses Map<Integer, Integer> to count occurrences and Set<Integer> to store chars from T that are present in current substring:
public static String completeSubstring(String s, String t) {
Map<Integer, Integer> occ
= t.chars().boxed().collect(Collectors.toMap(c -> c, c -> 0));
Set<Integer> found = new HashSet<>(); // characters from T found in current match
int start = 0; // current match
int bestStart = Integer.MIN_VALUE, bestEnd = -1;
for (int i = 0; i < s.length(); i++) {
int ci = s.charAt(i); // current char
if (!occ.containsKey(ci)) // not from T
continue;
occ.put(ci, occ.get(ci) + 1); // add occurrence
found.add(ci);
for (int j = start; j < i; j++) { // try to reduce current match
int cj = s.charAt(j);
Integer c = occ.get(cj);
if (c != null) {
if (c == 1) { // cannot reduce anymore
start = j;
break;
} else
occ.put(cj, c - 1); // remove occurrence
}
}
if (found.size() == occ.size() // all chars found
&& (i - start < bestEnd - bestStart)) {
bestStart = start;
bestEnd = i;
}
}
return bestStart < 0 ? null : s.substring(bestStart, bestEnd + 1);
}

ArrayIndexOutOfBoundsException coming in this array using java?

I have an array of numbers in Java and need to output the ones that consist of only duplicated digits. However, my code throws an ArrayIndexOutOfBoundsException. Where is the problem?
int[] inputValues= {122, 2, 22, 11, 234, 333, 000, 5555, 8, 9, 99};
for (int i = 0; i < inputValues.length; i++) {
int numberLength = Integer.toString(inputValues[i]).length();
// System.out.println(numberLength);
if (numberLength > 1) { //more than one digit in the number
String s1 = Integer.toString(inputValues[i]);
String[] numberDigits = s1.split("");
for (int j = 1, k = 1; j < numberDigits.length; k++) {
if (numberDigits[j].equals(numberDigits[k + 1])) {
System.out.println("Duplicate values are:");
//I need to print 22,11,333,000,5555,99
}
}
}
}
There is no condition to stop the inner loop when k gets too big. j never changes in the inner loop, so j < numberDigits.length will either always be true or always be false.
public static void main(String[] args) {
int[] inputValues={122,2,22,11,234,333,000,5555,8,9,99,1000};
System.out.println("Duplicate values are:");
for (int i = 0; i < inputValues.length; i++) {
String strNumber = new Integer(inputValues[i]).toString();// get string from array
if(strNumber.length()>1) // string length must be greater than 1
{
Character firstchar =strNumber.charAt(0); //get first char of string
String strchker =strNumber.replaceAll(firstchar.toString(), ""); //repalce it with black
if(strchker.length()==0) // for duplicate values length must be 0
{
System.out.println(strNumber);
}
}
}
/*
* output will be
* Duplicate values are:
22
11
333
5555
99
*
*
*/
}
This is what you want.....
This line is the culprit here -
for (int j = 1, k = 1; j < numberDigits.length; k++) {
if (numberDigits[j].equals(numberDigits[k + 1])) {
System.out.println("Duplicate values are:");//i need to print 22,11,333,000,5555,99,etc.
}
}
The loop has a condition that's always true as value of j is always 1. Since k keeps on increasing by 1 for each iteration ( which are infinite btw ), the index goes out of array bounds.
Try -
for (int j = 0, k = 1; k < numberDigits.length; k++) {
boolean isDuplicate = true;
if (!numberDigits[j].equals(numberDigits[k])) {
isDuplicate = false;
break;
}
}
if( isDuplicate ) {
System.out.println("Duplicate values are:"+inputValues[i]);
}
Sorry for joining the party late. I think following is the piece of code you’re are looking for
private int[] getDuplicate(int[] arr) {
ArrayList<Integer> duplicate = new ArrayList<Integer>();
for (int item : arr) {
if(item > 9 && areDigitsSame(item)) {
duplicate.add(item);
}
}
int duplicateDigits[] = new int[duplicate.size()];
int index = 0;
for (Integer integer : duplicate) {
duplicateDigits[index ++] = integer;
}
return duplicateDigits;
}
public boolean areDigitsSame(int item) {
int num = item;
int previousDigit = item % 10;
while (num != 0) {
int digit = num % 10;
if (previousDigit != digit) {
return false;
}
num /= 10;
}
return true;
}
Now , use it as below
int[]inputValues={122,2,22,11,234,333,000,5555,8,9,99};
int[] duplicates = getDuplicate(inputValues);
That's all
Enjoy!
public static void main(String[] args) {
String[] inputValues={"122","2","22","11","234","333","000","5555","8","9","99"};
System.out.println("Duplicate values are:");
for (int i = 0; i < inputValues.length; i++) {
String strNumber = inputValues[i];// get string from array
if(strNumber.length()>1) // string length must be greater than 1
{
Character firstchar =strNumber.charAt(0); //get first char of string
String strchker =strNumber.replaceAll(firstchar.toString(), "0"); //repalce it with 0
if(Integer.parseInt(strchker)==0) //if all values are duplictae than result string must be 0
{
System.out.println(strNumber);
}
}
}
}
// /// result will be
/* Duplicate values are:
22
11
333
000
5555
99
*/
if you want int array then you will not able to get "000" as duplicate value.
# line 13: the s1.split("") results to [, 1, 2, 2] for 122 . Hence your numberDigits.length is 4. The loop runs from j = 1 to 3 ( j < numberDigits.length); hence the numberDigits[k + 1 ] is evaluated for index 4 which is unavailable for [, 1, 2, 2].
Another point is worth noting is int[]inputValues will always store 000 as 0 only.
The below mentioned method will take a integer number and will return true and false based on your requirement. It will use xor operator to check repetitive digits.
private static boolean exorEveryCharacter(int currentValue) {
int result = 0;
int previousNumber = -1;
while (currentValue != 0) {
int currentNumber = currentValue % 10;
if(previousNumber == -1){
previousNumber = currentNumber;
}
else{
result = previousNumber ^ currentNumber;
}
currentValue /= 10;
}
return result == 0;
}

Categories

Resources