What is the Big O notation of a program? - java

I am trying to determine the algorithmic complexity of this program:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
public class SuffixArray
{
private String[] text;
private int length;
private int[] index;
private String[] suffix;
public SuffixArray(String text)
{
this.text = new String[text.length()];
for (int i = 0; i < text.length(); i++)
{
this.text[i] = text.substring(i, i+1);
}
this.length = text.length();
this.index = new int[length];
for (int i = 0; i < length; i++)
{
index[i] = i;
}
suffix = new String[length];
}
public void createSuffixArray()
{
for(int index = 0; index < length; index++)
{
String text = "";
for (int text_index = index; text_index < length; text_index++)
{
text+=this.text[text_index];
}
suffix[index] = text;
}
int back;
for (int iteration = 1; iteration < length; iteration++)
{
String key = suffix[iteration];
int keyindex = index[iteration];
for (back = iteration - 1; back >= 0; back--)
{
if (suffix[back].compareTo(key) > 0)
{
suffix[back + 1] = suffix[back];
index[back + 1] = index[back];
}
else
{
break;
}
}
suffix[ back + 1 ] = key;
index[back + 1 ] = keyindex;
}
System.out.println("SUFFIX \t INDEX");
for (int iterate = 0; iterate < length; iterate++)
{
System.out.println(suffix[iterate] + "\t" + index[iterate]);
}
}
public static void main(String...arg)throws IOException
{
String text = "";
BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
System.out.println("Enter the Text String ");
text = reader.readLine();
SuffixArray suffixarray = new SuffixArray(text);
suffixarray.createSuffixArray();
}
}
I have done some research on Wikipedia about the Big-O notation and have run the code with strings of different sizes. Based on the time it takes to run the code with different-size stings, I feel its complexity might be O(n^2). How can I know for sure?
Any help would be greatly appreciated.

The complexity of your createSuffixArray method is O(n^2) and you can determine that by examining the 3 looping blocks. Also, since in your question you said you've read the Wikipedia article covering the Big-O notation, then focus on its properties. They are the core of how to apply the right logic and compute the complexity of an algorithm.
Within your first block, the innermost loop iterating length times is in turn repeated for length times, yielding a complexity of O(n^2) due to the Product Property of the Big-O notation.
for(int index = 0; index < length; index++)
{
String text = "";
for (int text_index = index; text_index < length; text_index++)
{
text+=this.text[text_index];
}
suffix[index] = text;
}
In your second block, although the innermost loop does not perform length iterations at the beginning, its complexity still tends to O(n), producing again an overall complexity of O(n^2) due to the Product Property.
for (int iteration = 1; iteration < length; iteration++)
{
String key = suffix[iteration];
int keyindex = index[iteration];
for (back = iteration - 1; back >= 0; back--)
{
if (suffix[back].compareTo(key) > 0)
{
suffix[back + 1] = suffix[back];
index[back + 1] = index[back];
}
else
{
break;
}
}
suffix[ back + 1 ] = key;
index[back + 1 ] = keyindex;
}
Your third and last block simply iterates length times, giving us a linear complexity of O(n).
for (int iterate = 0; iterate < length; iterate++)
{
System.out.println(suffix[iterate] + "\t" + index[iterate]);
}
At this point to get the method complexity, we need to gather the 3 sub-complexities we've got and apply to them the Sum Property, which will yield O(n^2).

Related

Identify maximum distance element from String array in Java

I have a requirement where I have to find an element from an array that has maximum distance in occurrence.
If there are two terms that were equally searched, the return term has the longest distance between when it was first and last searched.
Given the following array
[‘C++’, ‘Java’, ‘C#’, ‘C#’, ‘Java’, ‘Python’, ‘C#’, ‘Java’]
Our Program should return “Most searched term is Java”
Note: Both ‘Java’ and ‘C#’ were searched 3 times, but the program should return ‘Java’ as distance
between its first and last search is greater than that of ‘C#’.
I tried searching, I can find the distance (6) but couldn't return the value (Java).
Thank you so much everyone for your input. I did this successfully in JS. I was only looking for Java help. I can post my JS code.
Thanks
class Wou {
public static int main(String args[]) {
String s = "";
for(String x: args)
s+=x;
int h = 0, p = -1; //p is for storing distance and h is for maximum distance between two strings in String 's'.
String word = "";
int len = args.length;
for(int i=0;i<len;i++) {
int cc = s.lastIndexOf(args[i])-s.indexOf(args[i]);
if (cc>h){
h = cc;
for(int j=len-1;j>=0;j--)
if(args[i]==args[j]) {
p = j-i;
break;
}
word = args[i];
}
s.replaceAll(args[i],"");
}
System.out.println("Most searched term is "+word);
return p;
}
}
This is my code which I tried -
String[] a ={"C++", "Java", "C#", "C#", "Java", "Python", "C#", "Java"};
String val ="";
int maxDist = 0;
for(int i=0; i<a.length; i++)
{
int firstOcc = -1;
int lastOcc = -1;
for(int j=0; j<a.length; j++)
{
if(a[i] == a[j])
{
//val=a[j];
if(firstOcc == -1)
firstOcc = lastOcc = j;
//val=a[j];
}
else
{
lastOcc = j;
val=a[j];
}
}
if(lastOcc - firstOcc > maxDist)
maxDist = lastOcc - firstOcc;
}
System.out.println("Most searched term is " + val);
}

How could I improve the speed/performance for this problem, Java

I saw this challenge on https://www.topcoder.com/ for Beginners. And I really wanted to complete it. I've got so close after so many failures. But I got stuck and don't know what to do no more. Here is what I mean
Question:
Read the input one line at a time and output the current line if and only if you have already read at least 1000 lines greater than the current line and at least 1000 lines less than the current line. (Again, greater than and less than are with respect to the ordering defined by String.compareTo().)
Link to the Challenge
My Solution:
public static void doIt(BufferedReader r, PrintWriter w) throws IOException {
SortedSet<String> linesThatHaveBeenRead = new TreeSet<>();
int lessThan =0;
int greaterThan =0;
Iterator<String> itr;
for (String currentLine = r.readLine(); currentLine != null; currentLine = r.readLine()){
itr = linesThatHaveBeenRead.iterator();
while(itr.hasNext()){
String theCurrentLineInTheSet = itr.next();
if(theCurrentLineInTheSet.compareTo(currentLine) == -1)++lessThan;
else if(theCurrentLineInTheSet.compareTo(currentLine) == 1)++greaterThan;
}
if(lessThan >= 1000 && greaterThan >= 1000){
w.println(currentLine);
lessThan = 0;
greaterThan =0;
}
linesThatHaveBeenRead.add(currentLine);
}
}
PROBLEM
I think the problem with my solution, is because I'm using nested loops which is making it a lot slower, but I've tried other ways and none worked. At this point I'm stuck. The whole point of this challenge is to make use of the most correct data-structure for this problem.
GOAL:
The goal is to use the most efficient data-structure for this problem.
Let me try to present just an accessible refinement of what to do.
public static void
doIt(java.io.BufferedReader r, java.io.PrintWriter w)
throws java.io.IOException {
feedNonExtremes(r, (line) -> { w.println(line);}, 1000, 1000);
}
/** Read <code>r</code> one line at a time and
* output the current line if and only there already were<br/>
* at least <code>nHigh</code> lines greater than the current line <br/>
* and at least <code>nLow</code> lines less than the current line.<br/>
* #param r to read lines from
* #param sink to feed lines to
* #param nLow number of lines comparing too small to process
* #param nHigh number of lines comparing too great to process
*/
static void feedNonExtremes(java.io.BufferedReader r,
Consumer<String> sink, int nLow, int nHigh) {
// collect nLow+nHigh lines into firstLowHigh; instantiate
// - a PriorityQueue(firstLowHigh) highest
// - a PriorityQueue(nLow, (a, b) -> String.compareTo(b, a)) lowest
// remove() nLow elements from highest and insert each into lowest
// for each remaining line
// if greater than the head of highest
// add to highest and remove head
// else if smaller than the head of lowest
// add to lowest and remove head
// else feed to sink
}
Made you a little example with Binary search, now in Java code. It will only use Binary search when newLine is within limits of the sorting.
public static void main(String[] args) {
// Create random lines
ArrayList<String> lines = new ArrayList<String>();
Random rn = new Random();
for (int i = 0; i < 50000; i++) {
int lenght = rn.nextInt(100);
char[] newString = new char[lenght];
for (int j = 0; j < lenght; j++) {
newString[j] = (char) rn.nextInt(255);
}
lines.add(new String(newString));
}
// Here starts logic
ArrayList<String> lowerCompared = new ArrayList<String>();
ArrayList<String> higherCompared = new ArrayList<String>();
int lowBoundry = 1000, highBoundry = 1000;
int k = 0;
int firstLimit = Math.min(lowBoundry, highBoundry);
// first x lines sorter equal
for (; k < firstLimit; k++) {
int index = Collections.binarySearch(lowerCompared, lines.get(k));
if (index < 0)
index = ~index;
lowerCompared.add(index, lines.get(k));
higherCompared.add(index, lines.get(k));
}
for (; k < lines.size(); k++) {
String newLine = lines.get(k);
boolean lowBS = newLine.compareTo(lowerCompared.get(lowBoundry - 1)) < 0;
boolean highBS = newLine.compareTo(higherCompared.get(0)) > 0;
if (lowerCompared.size() == lowBoundry && higherCompared.size() == highBoundry && !lowBS && !highBS) {
System.out.println("Time to print: " + newLine);
continue;
}
if (lowBS) {
int lowerIndex = Collections.binarySearch(lowerCompared, newLine);
if (lowerIndex < 0)
lowerIndex = ~lowerIndex;
lowerCompared.add(lowerIndex, newLine);
if (lowerCompared.size() > lowBoundry)
lowerCompared.remove(lowBoundry);
}
if (highBS) {
int higherIndex = Collections.binarySearch(higherCompared, newLine);
if (higherIndex < 0)
higherIndex = ~higherIndex;
higherCompared.add(higherIndex, newLine);
if (higherCompared.size() > highBoundry)
higherCompared.remove(0);
}
}
}
You need to implement binary search and also need to handle duplicates.
I've done some code sample here which does what you want ( may contains bugs).
public class CheckRead1000 {
public static void main(String[] args) {
// generate strings in revert order to get the worse case
List<String> aaa = new ArrayList<String>();
for (int i = 50000; i > 0; i--) {
aaa.add("some string 123456789" + i);
}
// fast solution
ArrayList<String> sortedLines = new ArrayList<>();
long st1 = System.currentTimeMillis();
for (String a : aaa) {
checkIfRead1000MoreAndLess(sortedLines, a);
}
System.out.println(System.currentTimeMillis() - st1);
// doIt solution
TreeSet<String> linesThatHaveBeenRead = new TreeSet<>();
long st2 = System.currentTimeMillis();
for (String a : aaa) {
doIt(linesThatHaveBeenRead, a);
}
System.out.println(System.currentTimeMillis() - st2);
}
// solution doIt
public static void doIt(SortedSet<String> linesThatHaveBeenRead, String currentLine) {
int lessThan = 0;
int greaterThan = 0;
Iterator<String> itr = linesThatHaveBeenRead.iterator();
while (itr.hasNext()) {
String theCurrentLineInTheSet = itr.next();
if (theCurrentLineInTheSet.compareTo(currentLine) == -1) ++lessThan;
else if (theCurrentLineInTheSet.compareTo(currentLine) == 1) ++greaterThan;
}
if (lessThan >= 1000 && greaterThan >= 1000) {
// System.out.println(currentLine);
lessThan = 0;
greaterThan = 0;
}
linesThatHaveBeenRead.add(currentLine);
}
// will return if we have read more at least 1000 string more and less then our string
private static boolean checkIfRead1000MoreAndLess(List<String> sortedLines, String newLine) {
//adding string to list and calculating its index and the last search range
int indexes[] = addNewString(sortedLines, newLine);
int index = indexes[0]; // index of element
int low = indexes[1];
int high = indexes[2];
//we need to check if this string already was in list for instance
// 1,2,3,4,5,5,5,5,5,6,7 for 5 we need to count 'less' as 4 and 'more' is 2
int highIndex = index;
for (int i = highIndex + 1; i < high; i++) {
if (sortedLines.get(i).equals(newLine)) {
highIndex++;
} else {
//no more duplicates
break;
}
}
int lowIndex = index;
for (int i = lowIndex - 1; i > low; i--) {
if (sortedLines.get(i).equals(newLine)) {
lowIndex--;
} else {
//no more duplicates
break;
}
}
// just calculating how many we did read more and less
if (sortedLines.size() - highIndex - 1 > 1000 && lowIndex > 1000) {
return true;
}
return false;
}
// simple binary search will insert string and return its index and ranges in sorted list
// first int is index,
// second int is start of range - will be used to find duplicates,
// third int is end of range - will be used to find duplicates,
private static int[] addNewString(List<String> sortedLines, String newLine) {
if (sortedLines.isEmpty()) {
sortedLines.add(newLine);
return new int[]{0, 0, 0};
}
// int index = Integer.MAX_VALUE;
int low = 0;
int high = sortedLines.size() - 1;
int mid = 0;
while (low <= high) {
mid = (low + high) / 2;
if (sortedLines.get(mid).compareTo(newLine) < 0) {
low = mid + 1;
} else if (sortedLines.get(mid).compareTo(newLine) > 0) {
high = mid - 1;
} else if (sortedLines.get(mid).compareTo(newLine) == 0) {
// index = mid;
break;
}
if (low > high) {
mid = low;
}
}
if (mid == sortedLines.size()) {
sortedLines.add(newLine);
} else {
sortedLines.add(mid, newLine);
}
return new int[]{mid, low, high};
}
}

Most efficient way to search for unknown patterns in a string?

I am trying to find patterns that:
occur more than once
are more than 1 character long
are not substrings of any other known pattern
without knowing any of the patterns that might occur.
For example:
The string "the boy fell by the bell" would return 'ell', 'the b', 'y '.
The string "the boy fell by the bell, the boy fell by the bell" would return 'the boy fell by the bell'.
Using double for-loops, it can be brute forced very inefficiently:
ArrayList<String> patternsList = new ArrayList<>();
int length = string.length();
for (int i = 0; i < length; i++) {
int limit = (length - i) / 2;
for (int j = limit; j >= 1; j--) {
int candidateEndIndex = i + j;
String candidate = string.substring(i, candidateEndIndex);
if(candidate.length() <= 1) {
continue;
}
if (string.substring(candidateEndIndex).contains(candidate)) {
boolean notASubpattern = true;
for (String pattern : patternsList) {
if (pattern.contains(candidate)) {
notASubpattern = false;
break;
}
}
if (notASubpattern) {
patternsList.add(candidate);
}
}
}
}
However, this is incredibly slow when searching large strings with tons of patterns.
You can build a suffix tree for your string in linear time:
https://en.wikipedia.org/wiki/Suffix_tree
The patterns you are looking for are the strings corresponding to internal nodes that have only leaf children.
You could use n-grams to find patterns in a string. It would take O(n) time to scan the string for n-grams. When you find a substring by using a n-gram, put it into a hash table with a count of how many times that substring was found in the string. When you're done searching for n-grams in the string, search the hash table for counts greater than 1 to find recurring patterns in the string.
For example, in the string "the boy fell by the bell, the boy fell by the bell" using a 6-gram will find the substring "the boy fell by the bell". A hash table entry with that substring will have a count of 2 because it occurred twice in the string. Varying the number of words in the n-gram will help you discover different patterns in the string.
Dictionary<string, int>dict = new Dictionary<string, int>();
int count = 0;
int ngramcount = 6;
string substring = "";
// Add entries to the hash table
while (count < str.length) {
// copy the words into the substring
int i = 0;
substring = "";
while (ngramcount > 0 && count < str.length) {
substring[i] = str[count];
if (str[i] == ' ')
ngramcount--;
i++;
count++;
}
ngramcount = 6;
substring.Trim(); // get rid of the last blank in the substring
// Update the dictionary (hash table) with the substring
if (dict.Contains(substring)) { // substring is already in hash table so increment the count
int hashCount = dict[substring];
hashCount++;
dict[substring] = hashCount;
}
else
dict[substring] = 1;
}
// Find the most commonly occurrring pattern in the string
// by searching the hash table for the greatest count.
int maxCount = 0;
string mostCommonPattern = "";
foreach (KeyValuePair<string, int> pair in dict) {
if (pair.Value > maxCount) {
maxCount = pair.Value;
mostCommonPattern = pair.Key;
}
}
I've written this just for fun. I hope I have understood the problem correctly, this is valid and fast enough; if not, please be easy on me :) I might optimize it a little more I guess, if someone finds it useful.
private static IEnumerable<string> getPatterns(string txt)
{
char[] arr = txt.ToArray();
BitArray ba = new BitArray(arr.Length);
for (int shingle = getMaxShingleSize(arr); shingle >= 2; shingle--)
{
char[] arr1 = new char[shingle];
int[] indexes = new int[shingle];
HashSet<int> hs = new HashSet<int>();
Dictionary<int, int[]> dic = new Dictionary<int, int[]>();
for (int i = 0, count = arr.Length - shingle; i <= count; i++)
{
for (int j = 0; j < shingle; j++)
{
int index = i + j;
arr1[j] = arr[index];
indexes[j] = index;
}
int h = getHashCode(arr1);
if (hs.Add(h))
{
int[] indexes1 = new int[indexes.Length];
Buffer.BlockCopy(indexes, 0, indexes1, 0, indexes.Length * sizeof(int));
dic.Add(h, indexes1);
}
else
{
bool exists = false;
foreach (int index in indexes)
if (ba.Get(index))
{
exists = true;
break;
}
if (!exists)
{
int[] indexes1 = dic[h];
if (indexes1 != null)
foreach (int index in indexes1)
if (ba.Get(index))
{
exists = true;
break;
}
}
if (!exists)
{
foreach (int index in indexes)
ba.Set(index, true);
int[] indexes1 = dic[h];
if (indexes1 != null)
foreach (int index in indexes1)
ba.Set(index, true);
dic[h] = null;
yield return new string(arr1);
}
}
}
}
}
private static int getMaxShingleSize(char[] arr)
{
for (int shingle = 2; shingle <= arr.Length / 2 + 1; shingle++)
{
char[] arr1 = new char[shingle];
HashSet<int> hs = new HashSet<int>();
bool noPattern = true;
for (int i = 0, count = arr.Length - shingle; i <= count; i++)
{
for (int j = 0; j < shingle; j++)
arr1[j] = arr[i + j];
int h = getHashCode(arr1);
if (!hs.Add(h))
{
noPattern = false;
break;
}
}
if (noPattern)
return shingle - 1;
}
return -1;
}
private static int getHashCode(char[] arr)
{
unchecked
{
int hash = (int)2166136261;
foreach (char c in arr)
hash = (hash * 16777619) ^ c.GetHashCode();
return hash;
}
}
Edit
My previous code has serious problems. This one is better:
private static IEnumerable<string> getPatterns(string txt)
{
Dictionary<int, int> dicIndexSize = new Dictionary<int, int>();
for (int shingle = 2, count0 = txt.Length / 2 + 1; shingle <= count0; shingle++)
{
Dictionary<string, int> dic = new Dictionary<string, int>();
bool patternExists = false;
for (int i = 0, count = txt.Length - shingle; i <= count; i++)
{
string sub = txt.Substring(i, shingle);
if (!dic.ContainsKey(sub))
dic.Add(sub, i);
else
{
patternExists = true;
int index0 = dic[sub];
if (index0 >= 0)
{
dicIndexSize[index0] = shingle;
dic[sub] = -1;
}
}
}
if (!patternExists)
break;
}
List<int> lst = dicIndexSize.Keys.ToList();
lst.Sort((a, b) => dicIndexSize[b].CompareTo(dicIndexSize[a]));
BitArray ba = new BitArray(txt.Length);
foreach (int i in lst)
{
bool ok = true;
int len = dicIndexSize[i];
for (int j = i, max = i + len; j < max; j++)
{
if (ok) ok = !ba.Get(j);
ba.Set(j, true);
}
if (ok)
yield return txt.Substring(i, len);
}
}
Text in this book took 3.4sec in my computer.
Suffix arrays are the right idea, but there's a non-trivial piece missing, namely, identifying what are known in the literature as "supermaximal repeats". Here's a GitHub repo with working code: https://github.com/eisenstatdavid/commonsub . Suffix array construction uses the SAIS library, vendored in as a submodule. The supermaximal repeats are found using a corrected version of the pseudocode from findsmaxr in Efficient repeat finding via suffix arrays
(Becher–Deymonnaz–Heiber).
static void FindRepeatedStrings(void) {
// findsmaxr from https://arxiv.org/pdf/1304.0528.pdf
printf("[");
bool needComma = false;
int up = -1;
for (int i = 1; i < Len; i++) {
if (LongCommPre[i - 1] < LongCommPre[i]) {
up = i;
continue;
}
if (LongCommPre[i - 1] == LongCommPre[i] || up < 0) continue;
for (int k = up - 1; k < i; k++) {
if (SufArr[k] == 0) continue;
unsigned char c = Buf[SufArr[k] - 1];
if (Set[c] == i) goto skip;
Set[c] = i;
}
if (needComma) {
printf("\n,");
}
printf("\"");
for (int j = 0; j < LongCommPre[up]; j++) {
unsigned char c = Buf[SufArr[up] + j];
if (iscntrl(c)) {
printf("\\u%.4x", c);
} else if (c == '\"' || c == '\\') {
printf("\\%c", c);
} else {
printf("%c", c);
}
}
printf("\"");
needComma = true;
skip:
up = -1;
}
printf("\n]\n");
}
Here's a sample output on the text of the first paragraph:
Davids-MBP:commonsub eisen$ ./repsub input
["\u000a"
," S"
," as "
," co"
," ide"
," in "
," li"
," n"
," p"
," the "
," us"
," ve"
," w"
,"\""
,"–"
,"("
,")"
,". "
,"0"
,"He"
,"Suffix array"
,"`"
,"a su"
,"at "
,"code"
,"com"
,"ct"
,"do"
,"e f"
,"ec"
,"ed "
,"ei"
,"ent"
,"ere's a "
,"find"
,"her"
,"https://"
,"ib"
,"ie"
,"ing "
,"ion "
,"is"
,"ith"
,"iv"
,"k"
,"mon"
,"na"
,"no"
,"nst"
,"ons"
,"or"
,"pdf"
,"ri"
,"s are "
,"se"
,"sing"
,"sub"
,"supermaximal repeats"
,"te"
,"ti"
,"tr"
,"ub "
,"uffix arrays"
,"via"
,"y, "
]
I would use Knuth–Morris–Pratt algorithm (linear time complexity O(n)) to find substrings. I would try to find the largest substring pattern, remove it from the input string and try to find the second largest and so on. I would do something like this:
string pattern = input.substring(0,lenght/2);
string toMatchString = input.substring(pattern.length, input.lenght - 1);
List<string> matches = new List<string>();
while(pattern.lenght > 0)
{
int index = KMP(pattern, toMatchString);
if(index > 0)
{
matches.Add(pattern);
// remove the matched pattern occurences from the input string
// I would do something like this:
// 0 to pattern.lenght gets removed
// check for all occurences of pattern in toMatchString and remove them
// get the remaing shrinked input, reassign values for pattern & toMatchString
// keep looking for the next largest substring
}
else
{
pattern = input.substring(0, pattern.lenght - 1);
toMatchString = input.substring(pattern.length, input.lenght - 1);
}
}
Where KMP implements Knuth–Morris–Pratt algorithm. You can find the Java implementations of it at Github or Princeton or write it yourself.
PS: I don't code in Java and it is quick try to my first bounty about to close soon. So please don't give me the stick if I missed something trivial or made a +/-1 error.

Finding the intersection between two list of string candidates

I wrote the following Java code, to find the intersection between the prefix and the suffix of a String in Java.
// you can also use imports, for example:
// import java.math.*;
import java.util.*;
class Solution {
public int max_prefix_suffix(String S) {
if (S.length() == 0) {
return 1;
}
// prefix candidates
Vector<String> prefix = new Vector<String>();
// suffix candidates
Vector<String> suffix = new Vector<String>();
// will tell me the difference
Set<String> set = new HashSet<String>();
int size = S.length();
for (int i = 0; i < size; i++) {
String candidate = getPrefix(S, i);
// System.out.println( candidate );
prefix.add(candidate);
}
for (int i = size; i >= 0; i--) {
String candidate = getSuffix(S, i);
// System.out.println( candidate );
suffix.add(candidate);
}
int p = prefix.size();
int s = suffix.size();
for (int i = 0; i < p; i++) {
set.add(prefix.get(i));
}
for (int i = 0; i < s; i++) {
set.add(suffix.get(i));
}
System.out.println("set: " + set.size());
System.out.println("P: " + p + " S: " + s);
int max = (p + s) - set.size();
return max;
}
// codility
// y t i l i d o c
public String getSuffix(String S, int index) {
String suffix = "";
int size = S.length();
for (int i = size - 1; i >= index; i--) {
suffix += S.charAt(i);
}
return suffix;
}
public String getPrefix(String S, int index) {
String prefix = "";
for (int i = 0; i <= index; i++) {
prefix += S.charAt(i);
}
return prefix;
}
public static void main(String[] args) {
Solution sol = new Solution();
String t1 = "";
String t2 = "abbabba";
String t3 = "codility";
System.out.println(sol.max_prefix_suffix(t1));
System.out.println(sol.max_prefix_suffix(t2));
System.out.println(sol.max_prefix_suffix(t3));
System.exit(0);
}
}
Some test cases are:
String t1 = "";
String t2 = "abbabba";
String t3 = "codility";
and the expected values are:
1, 4, 0
My idea was to produce the prefix candidates and push them into a vector, then find the suffix candidates and push them into a vector, finally push both vectors into a Set and then calculate the difference. However, I'm getting 1, 7, and 0. Could someone please help me figure it out what I'm doing wrong?
I'd write your method as follows:
public int max_prefix_suffix(String s) {
final int len = s.length();
if (len == 0) {
return 1; // there's some dispute about this in the comments to your post
}
int count = 0;
for (int i = 1; i <= len; ++i) {
final String prefix = s.substring(0, i);
final String suffix = s.substring(len - i, len);
if (prefix.equals(suffix)) {
++count;
}
}
return count;
}
If you need to compare the prefix to the reverse of the suffix, I'd do it like this:
final String suffix = new StringBuilder(s.substring(len - i, len))
.reverse().toString();
I see that the code by #ted Hop is good..
The question specify to return the max number of matching characters in Suffix and Prefix of a given String, which is a proper subset. Hence the entire string is not taken into consideration for this max number.
Ex. "abbabba", prefix and suffix can have abba(first 4 char) - abba (last 4 char),, hence the length 4
codility,, prefix(c, co,cod,codi,co...),, sufix (y, ty, ity, lity....), none of them are same.
hence length here is 0.
By modifying the count here from
if (prefix.equals(suffix)) {
++count;
}
with
if (prefix.equals(suffix)) {
count = prefix.length();// or suffix.length()
}
we get the max length.
But could this be done in O(n).. The inbuilt function of string equals, i believe would take O(n), and hence overall complexity is made O(n2).....
i would use this code.
public static int max_prefix_suffix(String S)
{
if (S == null)
return -1;
Set<String> set = new HashSet<String>();
int len = S.length();
StringBuilder builder = new StringBuilder();
for (int i = 0; i < len - 1; i++)
{
builder.append(S.charAt(i));
set.add(builder.toString());
}
int max = 0;
for (int i = 1; i < len; i++)
{
String suffix = S.substring(i, len);
if (set.contains(suffix))
{
int suffLen = suffix.length();
if (suffLen > max)
max = suffLen;
}
}
return max;
}
#ravi.zombie
If you need the length in O(n) then you just need to change Ted's code as below:
int max =0;
for (int i = 1; i <= len-1; ++i) {
final String prefix = s.substring(0, i);
final String suffix = s.substring(len - i, len);
if (prefix.equals(suffix) && max < i) {
max =i;
}
return max;
}
I also left out the entire string comparison to get proper prefix and suffixes so this should return 4 and not 7 for an input string abbabba.

having problems with inversion-counting Algorithm

I wrote an implementation of inversion-counting. This is an assignment being carried out in an online course. But the input i get isn't correct and according to what I have the program has a correct syntax. I dont just know were I went wrong
The program below is my implementation
import java.io.*;
class CountInversions {
//Create an array of length 1 and keep expanding as data comes in
private int elements[];
private int checker = 0;
public CountInversions() {
elements = new int[1];
checker = 0;
}
private boolean isFull() {
return checker == elements.length;
}
public int[] getElements() {
return elements;
}
public void push(int value) {
if (isFull()) {
int newElements[] = new int[elements.length * 2];
System.arraycopy(elements, 0, newElements, 0, elements.length);
elements = newElements;
}
elements[checker++] = value;
}
public void readInputElements() throws IOException {
//Read input from file and until the very last input
try {
File f = new File("IntegerArray.txt");
FileReader fReader = new FileReader(f);
BufferedReader br = new BufferedReader(fReader);
String stringln;
while ((stringln = br.readLine()) != null) {
push(Integer.parseInt(stringln));
}
System.out.println(elements.length);
fReader.close();
} catch (Exception e) {//Catch exception if any
System.err.println("Error: " + e.getMessage());
} finally {
// in.close
}
}
//Perform the count inversion algorithm
public int countInversion(int array[],int length){
int x,y,z;
int mid = array.length/2 ;
int k;
if (length == 1){
return 0;
}else{
//count Leftinversion and count Rightinversion respectively
int left[] = new int[mid];
int right[] = new int[array.length - mid];
for(k = 0; k < left.length;k++){
left[k] = array[k];
}
for(k = 0 ;k < right.length;k++){
right[k] = array[mid + k];
}
x = countInversion(left, left.length);
y = countInversion(right, right.length);
int result[] = new int[array.length];
z = mergeAndCount(left,right,result);
//count splitInversion
return x + y + z;
}
}
private int mergeAndCount(int[] left, int[] right, int[] result) {
int count = 0;
int i = 0, j = 0, k = 0;
int m = left.length, n = right.length;
while(i < m && j < n){
if(left[i] < right[j]){
result[k++] = left[i++];
}else{
result[k++] = right[j++];
count += left.length - i;
}
}
if(i < m){
for (int p = i ;p < m ;p++){
result[k++] = left[p];
}
}
if(j < n){
for (int p = j ;p < n ;p++){
result[k++] = right[p];
}
}
return count;
}
}
class MainApp{
public static void main(String args[]){
int count = 0;
CountInversions cIn = new CountInversions();
try {
cIn.readInputElements();
count = cIn.countInversion(cIn.getElements(),cIn.getElements().length);
System.out.println("Number of Inversios: " + count);
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
Your code works if the array length is a power of 2 (actually, I'm not sure whether if does, see second point below).
When reading the input, you double the array size when it's full, but you never resize it to the number of actually read items, which is stored in checker. So your array length is a power of 2, and if the number of ints read from the file is not a power of 2, you have a too long array with some trailing 0 elements corresponding to the places allocated but not filled from the file. Since you call countInversions with the length of the array and not with the number of read items, these 0s are sorted too, yielding some spurious inversions.
After reading the input, allocate a new array
int[] copy = new int[checker];
for(int i = 0; i < checker; ++i) {
copy[i] = elements[i];
}
elements = copy;
copy the elements, and discard the old array with the wrong capacity.
Another problem in your algorithm is that you never change the original array because you allocate a new array for the merge result,
int result[] = new int[array.length];
z = mergeAndCount(left,right,result);
so you are merging unsorted arrays, which may also skew the inversion count. Since you copied the halves of the input array to new arrays for the recursive calls, you can without problem put the merge result in the passed-in array, so replace the above two lines with
z = mergeAndCount(left,right,array);
to get a method that actually sorts the array and counts the inversions.
This post is tackling the problem of count inversion with Java (except for file reading, which you have done OK) - Counting inversions in an array

Categories

Resources