What is the complexity of these two string based algorithms?

What is the complexity of these two string based algorithms? - java

I have written these two algorithms to check a string for duplicate characters (ABBC, AAAC). The first uses the hashset data structure, whilst the second relies purely on iteration.
Algorithm 1
String s = "abcdefghijklmnopqrstuvwxxyz";
public boolean isUnique(String s) {
Set<Character> charSet = new HashSet<Character>();
for(int i=0; i<s.length(); i++) {
if(charSet.contains(s.charAt(i))) {
return false;
}
charSet.add(s.charAt(i));
}
return true;
}
Algorithm 2
String s = "abcdefghijklmnopqrstuvwxxyz";
public boolean isUnique2(String s) {
for(int i=0; i<s.length()-1; i++) {
for(int j = i+1; j<s.length(); j++) {
if(s.charAt(i) == s.charAt(j)) {
return false;
}
}
}
return true;
}
My thoughts are that the first algorithm is O(N), and the second algorithm is O(N^2). When i run execution time tests on my (possibly unreliable) laptop, the average speed for the first algorithm is 2020ns, whilst the second algorithm is 995ns. This goes against my calculation of the algorithms complexity, could anybody advise me?

when using O() notation you ignore constants, which means that O(n) == (10^10*n). so while O(n^2)>O(n) is true asymptotically, its not necessarily true for smaller values of n.
in your case imagine that maybe resizing the array behind the hashset could be more time consuming than iterating the input.

Assuming that the charAt method runs in O(1) time, the first algorithm is O(N) and the second is O(N^2). A linear time algorithm is not supposed to be faster than a quadratic algorithm for all inputs. It will be faster than the quadratic one after a certain N (which could possibly be in the millions).
for example:
void funcA(int n){
for (int i = 0; i < n; i++){
for (int j = 0; j < 10000; j++){
int k = i + j;
}
}
}
void funcB(int n){
for (int i = 0; i < n; i++){
for (int j = 0; j < n; j++){
int k = i + j;
}
}
}
even though funcA is linear and funcB is quadratic, it is easy to see that funcB will be faster than funcA for n < 10000. In your case the hashSet requires time to compute the hash and so may be slower for inputs of a certain size.

Micro benchmarking that you are doing can give very misleading info about algorithm complexities.
It's easy to "port" your algorithms to check for duplicates in, say, array of Integers.
Then I recommend testing performance on say, array of 10^7 elements and you will definitely see the difference.
This way you'd be able to confirm your initially correct estimation O(N) for hashset vs O(N^2) for the second "loop" version.

There is a problem with your test data, for example, if you limit yourself to English language characters (a-z), you are guaranteed to have a duplicate if string length > 26. In the specific example you provided the string "abcdefghijklmnopqrstuvwxxyz" is sorted and the duplicate element x is found towards the end. As such iterated array lookup is faster because there is overhead in building the HashSet as you continue to parse the String.
A better test would be to test this with randomly generated integer sequences of large sizes and with a large max value, e.g. Long.MAX_VALUE
Below is a test that disproves your assertion that array search is faster. Run it a few times and see for yourself. Or you could take averages from 1000 runs, etc:
public class FindDuplicatesTest {
public static final String s = generateRandomString(100000);
private static String generateRandomString(int numChars) {
Random random = new Random();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < numChars; i++) {
int codePoint = random.nextInt(65536);
sb.append(Character.toChars(codePoint));
}
return sb.toString();
}
public boolean isUnique(String s) {
Set<Character> charSet = new HashSet<Character>();
for (int i = 0; i < s.length(); i++) {
if (charSet.contains(s.charAt(i))) {
return false;
}
charSet.add(s.charAt(i));
}
return true;
}
public boolean isUnique2(String s) {
for (int i = 0; i < s.length() - 1; i++) {
for (int j = i + 1; j < s.length(); j++) {
if (s.charAt(i) == s.charAt(j)) {
return false;
}
}
}
return true;
}
public static void main(String[] args) {
FindDuplicatesTest app = new FindDuplicatesTest();
long start = System.nanoTime();
boolean result = app.isUnique(s);
long stop = System.nanoTime();
System.out.println(result);
System.out.println("HashSet Search Time: " + (stop - start));
start = System.nanoTime();
result = app.isUnique2(s);
stop = System.nanoTime();
System.out.println(result);
System.out.println("Array Search Time: " + (stop - start));
}
}

Related

Verifying the time complexity of selection sort by counting the number of comparisons

I'm trying to verify that selection sort is O(n^2) by counting the number of times numbers are being compared in my array. I'm using an array of 20 numbers so I know I should be getting somewhere around 400, but with the current placement of my counting variable I'm only getting around 200 comparisons being counted. Here is the code for the selection sort I am using:
public static void selectionSort(final int[] arr) {
for (int i = 0; i < arr.length - 1; i++) {
int minElementIndex = i;
for (int j = i + 1; j < arr.length; j++) {
swapCount++;
if (arr[minElementIndex] > arr[j])
{
minElementIndex = j;
}
}
if (minElementIndex != i) {
int temp = arr[i];
arr[i] = arr[minElementIndex];
arr[minElementIndex] = temp;
}
}
}
The swapCount variable is being used to track the comparisons and I've tried moving it around everywhere I could think of, but I can't get a count nearly as high as I'm expecting when I print the total swapCount at the end. Where exactly in the code should swapCount be to track all the comparisons?

Longest Most Common Substring Based on Whole-Word Phrases

I've been doing a lot of research around this topic and can't quite crack this one easily. There are a lot of valuable solutions I've come across online for solving this problem based on characters, but how would you solve this problem based on whole-word phrases to avoid the result returning a phrase that contains a partial word at the start or end of the phrase?
For example, given an Array of Strings, the output would be the most common whole-word phrase that is contained in most (not all) of the Strings within the Array.
This example below is the closest I've found so far but it only works about half of the time and includes partial word results which isn't quite what I'm after. I'm sure someone has solved this one before.
// function to find the stem (longest common
// substring) from the string array
public static String findstem(String arr[])
{
// Determine size of the array
int n = arr.length;
// Take first word from array as reference
String s = arr[0];
int len = s.length();
String res = "";
for (int i = 0; i < len; i++) {
for (int j = i + 1; j <= len; j++) {
// generating all possible substrings
// of our reference string arr[0] i.e s
String stem = s.substring(i, j);
int k = 1;
for (k = 1; k < n; k++)
// Check if the generated stem is
// common to all words
if (!arr[k].contains(stem))
break;
// If current substring is present in
// all strings and its length is greater
// than current result
if (k == n && res.length() < stem.length())
res = stem;
}
}
return res;
}
// Driver Code
public static void main(String args[])
{
String arr[] = { "grace", "graceful", "disgraceful",
"gracefully" };
String stems = findstem(arr);
System.out.println(stems);
}

Does this do what you intended. It simply checks to see if any word is a substring of itself and others.
If you want to check for real word substrings you would need to reference some dictionary which would be very time consuming.
String arr[] = { "grace", "graceful", "disgraceful",
"gracefully" };
String save = "";
int count = 0;
for (int i = 0; i < arr.length && count != arr.length; i++) {
count = 0;
for (int k = 0; k < arr.length; k++) {
if (arr[k].contains(arr[i])) {
count++;
save = arr[i];
}
}
}
System.out.println(save);

How to reduce the number of for loops

How I can improve this code,I am getting accurate output but it seems little long and unnecessary operations. Any Suggestion.
public class Test {
public static void main(String[] args) {
List<Integer> a = new ArrayList<Integer>();
a.add(1);
a.add(2);
List<Integer> b = new ArrayList<Integer>();
b.add(3);
b.add(5);
System.out.println(test(5, a, b));
}
public static long test(int n, List<Integer> a, List<Integer> b) {
// Write your code here
long retCnt = 0;
List<String> enemy = new ArrayList<String>();
for (int i = 0; i < a.size(); i++) {
enemy.add(a.get(i) + "" + b.get(i));
}
String tempstr = "";
int tempj = 1;
for (int m = 1; m <= n; m++) {
int temp = 1;
for (int i = 1; i <= n; i++) {
tempstr = "";
for (int j = tempj; j <= temp; j++) {
tempstr += j;
}
temp++;
if (!"".equalsIgnoreCase(tempstr)) {
if (isValidGroup(enemy, tempstr)) {
retCnt++;
} else {
break;
}
}
}
tempj++;
}
return retCnt;
}
public static boolean isValidGroup(List<String> enemy, String group) {
for (int i = 0; i < enemy.size(); i++) {
if (group.trim().toUpperCase().contains(String.valueOf(enemy.get(i).charAt(0)).toUpperCase())&& group.trim().contains(String.valueOf(enemy.get(i).charAt(1)).toUpperCase())) {
return false;
}
}
return true;
}
}
Short description of the problem statement.
I have a enemy list , That is contains pair such as 13 and 25 from the input array list and b respectively.
I have a number n call 5 , I have to generate possible permutations which should be not part of the enemy list.
Please comment if further clarifications needed.

Your code is slow. If n was 100, your code would require more than 100 million computations to execute.
The whole test function can however be executed in O(N) with some binomial math and if you directly jump above the indices where invalid numbers are. It can also be done in O(N^2) with the very simple algorithm below.
First thing I would do to save memory and code is to delete the variables tempj and temp, because you can use variables m and i for doing the same work and those have always the same values associated and they have to be created anyways for doing the right amount of iterations.
Also another useful thing to notice is that tempj will sometimes (in around half of all iterations to be more exact) be bigger than temp. In all those occasions, you won't be finding any valid permutations, because j iterates only from temp to tempj in increasing order. In other words, half of the computations are useless.
Tempstr can be precomputed.
Imagine tempj was 1 and temp was 3. J will then do 2 iterations from 1 to 2 and from 2 to 3. J has reached temp, so you add one to temp. Temp is now 4 and Tempj is still 1.
Now J has to do the exact previous 2 steps to get from 1 to 3, and then an additional one to get to 4, where temp is. You can skip those previous 2 steps because you already know what tempstr will look like after them. Instead of resetting j, keep increasing it as temp increases.
Here is a snippet of the O(N^2) (without taking into account isValidGroup()'s complexity, which can be easily optimized using an array of booleans, where you mark the invalid positions in N^2)
String tempstr = "";
for(int start = 1; start <= n; start++) {
tempstr = "";
for(int end = start; end <= n; end++) {
tempstr += end;
if(isValidGroup(enemy, tempstr)) {
retCnt++;
} else {
break;
}
}
}

Big O of String repetition append in Java

I have a function to get a a string as a repetition of an original string. I'm wondering if I use StringBuilder append, what is the Big O of the function? Is it O(nl) : n is number of repeats and l is length of the original string ?
public String getRepetition(String originalStr, Integer n){
StringBuilder str = new StringBuilder();
for(int i = 0; i < n; i++)
str.append(originalStr);
return str.toString();
}
Comparing with the approach below, which one is better?
public String getRepetition(String originalStr, Integer n){
String str = "";
for(int i = 0; i < n; i++)
str += originalStr;
return originalStr;
}

I'm not sure why other three answers are all saying both pieces of code are O(n). Assuming originalStr is not "", the first is O(n) the other O(n^2)! (That's an exclamation, not a factorial.) They teach this on the first day of Java school. C programmers get "don't use strlen in the condition of that for loop"; Java programmers get this.
String str = "";
for(int i = 0; i < n; i++)
str += originalStr;
Each time around this loop str is getting longer. It's i * orginalStr.length(). Creating a new String (assuming no wild compiler optimisations) which takes time roughly proportional to i each time.
Edit: Usually we ignore the length of the original string. But yeah, of course it's going to be proprotional, so O(nstrlen(originalStr)) and O(nn*strlen(originalStr)). By convention this is dealt with separately.
If we rewrite the code without the String abstraction, perhaps it will be clearer.
public static char[] getRepetition(char[] originalStr, int n) {
char[] str = {};
for (int i = 0; i < n; ++i) {
assert str.length == i * originalStr.length;
char[] newStr = new char[str.length + originalStr.length];
for (int j=0; j<str.length; ++j) {
newStr[j] = str[j];
}
for (int j=0; j<originalStr.length; ++j) {
newStr[str.length+j] = originalStr[j];
}
str = newStr;
}
return str;
}
(As ever, I've not bothered to so much as compile the code. Not safe to use in a nuclear reactor.)
Just for giggles, let's deabstract the first implementation.
public static char[] getRepetition(char[] originalStr, int n) {
char[] str = new char[16];
int strLen = 0;
for (int i = 0; i < n; ++i) {
assert strLen == i * originalStr.length;
// ensureCapacity
if (str.length < strLen + originalStr.length) {
// The size at least doubles,
// so this happens increasing less often.
// It wont happen again for approximately
// the same number of iterations
// as have already happened!
char[] newStr = new char[Math.min(
str.length + originalStr.length, // first time safe
str.length*2 + 2 // *2 !
)];
for (int j=0; j<strLen; ++j) {
newStr[j] = str[j];
}
str = newStr;
}
// actual append
for (int j=0; j<originalStr.length; ++j) {
str[strLen++] = originalStr[j];
}
}
// toString
char[] newStr = new char[strLen];
for (int i=0; j<newStr.length; ++i) {
newStr[i] = str[j];
}
return newStr;
}

Both of your approaches are O(n) while the first approach eliminates several temporary String(s). It isn't clear why you have made n an Integer, nor why you have not made this a static method (it depends on no instance state). Additionally, in Java 8+, you could implement it with a lambda like
public static String getRepetition(String originalStr, int n) {
return Stream.generate(() -> originalStr).limit(n).collect(Collectors.joining());
}
Also, if you're going to use a StringBuilder as in your first example, you can explicitly size it to avoid having to amortize the cost of resizing the StringBuilder
StringBuilder str = new StringBuilder(originalStr.length() * n);

In both the cases the complexity is O(n) because you are iterating n times.
The only difference in second approach is you are creating new String in each iteration i.e. at str += originalStr;

How to check the inequality of all integers in an array?

I'm trying to figure out how to check if all 23 numbers in my array (the numbers are randomly generated) are NOT equal to each other but I can't figure out how to do it without a super ridiculous if statement. Is there any other way I could do it? Another option would be to check if any two numbers in the array are equal but the reason I posed the question the way I did was because I figured checking the equality of each pair would be harder than checking the inequality of all the numbers.

Your problem can be solved with two nested for-loops
public static boolean hasDuplicates(int[] array)
{
for (int i = 0, length = array.length; i < length; i++)
{
int val = array[i];
for (int j = 0; j < i; j++)
{
if (array[j] == val)
{
return true;
}
}
}
return false;
}
The first loop loops over every element in the array, while the second one checks if any of the values before the index of the outer loop are equal to the value at array[i]. You can safely use j < i for performance and to ensure that it doesn't return true on the same element.

You can use an algorithm like the following to test whether every array element is unique:
boolean everyNumberIsUnique(int[] numbers) {
for(int i = 0; i < numbers.length; i++) {
for(int j = i + 1; j < numbers.length; j++) {
if(numbers[i] == numbers[j]) return false;
}
}
return true;
}
It simply compares each number against each other and returns false if any two numbers are equal.

If you have a relatively low upper bound on your numbers (say less than 1 million) and don't really care about memory usage, you can just create a boolean array and set each element to true if its index is in your array. This is O(n), so its probably the best you can do, but, admittedly, you have to meet the above criteria.
public boolean allDifferent(int[] numbers)
{
//everything in the array defaults to false
boolean[] array = new boolean[upperBound+1];
for (int i = 0; i < numbers.length; i++)
{
if (array[numbers[i]]) //if we've already seen this number (aka duplicate)
{
return false;
}
array[numbers[i]] = true; //note that we have now seen this number
}
return true;
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

What is the complexity of these two string based algorithms? - java

Related

Verifying the time complexity of selection sort by counting the number of comparisons

Longest Most Common Substring Based on Whole-Word Phrases

How to reduce the number of for loops

Big O of String repetition append in Java

How to check the inequality of all integers in an array?

Categories

Resources