Calculating space complexity of string compression - Cracking the coding interview

Calculating space complexity of string compression - Cracking the coding interview - java

I am trying to understand space complexity of the following piece of code. The code compresses Strings from "aabbbb" to "a2b4". The question is Question 5, Chapter 1 from Cracking the coding interview version 5 (2013) and the code is taken from the solutions
public static String compressBetter(String str) {
int size = countCompression(str);
if (size >= str.length()) {
return str;
}
StringBuffer mystr = new StringBuffer();
char last = str.charAt(0);
int count = 1;
for (int i = 1; i < str.length(); i++) {
if (str.charAt(i) == last) {
count++;
} else {
mystr.append(last);
mystr.append(count);
last = str.charAt(i);
count = 1;
}
}
mystr.append(last);
mystr.append(count);
return mystr.toString();
}
where
public static int countCompression(String str) {
if (str == null || str.isEmpty()) return 0;
char last = str.charAt(0);
int size = 0;
int count = 1;
for (int i = 1; i < str.length(); i++) {
if (str.charAt(i) == last) {
count++;
} else {
last = str.charAt(i);
size += 1 + String.valueOf(count).length();
count = 1;
}
}
size += 1 + String.valueOf(count).length();
return size;
}
According to the author compressBetter has O(N) space complexity. Why is that not O(1)?
In every run of countCompression, we hold last, size and count and similar for compressBetter (holds countCompression variables plus mystr, last, count. My understanding of space complexity is "how much memory the algorithm needs/holds at any time". In other words space complexity unlike time complexity is not cumulative.
Note that the author considers in the book only what people refer as "auxiliary space complexity" (without the space needed to store input) as in the example above. Also, afaik there is no entry in the errata of the book on this.
UPDATE: My source of confusion was from the following example (Question 1.1 in the same book)
public static boolean isUniqueChars2(String str) {
boolean[] char_set = new boolean[256];
for (int i = 0; i < str.length(); i++) {
int val = str.charAt(i);
if (char_set[val]) return false;
char_set[val] = true;
}
return true;
}
which is O(1) despite a 256 boolean array allocation - I thought allocations don't matter in calculating space complexity. But in reality it's O(1) because the space needed is constant and independent of the input size (unlike the mystr Stringbuffer).

Just converting my previous comment into an answer: you are holding a StringBuffer that, potentially, could have a size proportional to the String's one. Just think about the case (the worst one) in which you have an input String with no consecutive, repeated characters.

You are asking about the space complexity of compressBetter, which includes a call to countCompression, but also performs additional work.
While the space complexity of countCompression is indeed O(1), compressBetter has linear space complexity (i.e. O(N)) in the worst case (where no two consecutive characters of the input String are equal), since it produces a StringBuffer of 2N characters in that case.

Related

How to Optimise a given problem time complexity?

I have a question which says
Given an input
ababacad
The output should be all the a's should come first together rest characters should follow their sequence as they were originally. i.e.
aaaabbcd
I solved it like below code
String temp="", first="" ;
for(int i=0;i<str.length;i++) {
if(str.charAt(i)!='a')
temp=temp+str.charAt(i);
else
first=first+str.charAt(i);
}
System.out.print(first+temp);
The output matches but it says it is still not optimised. I guess its already order of N complexity. Can it be optimised further.

Optimization here can mean string operations as well as the number of iterations. So, just to be on the safe side, you can implement it using arrays. The time complexity will be O(n) which is the minimum for this problem as you have to check every position for the presence of char 'a'.
String input = "ababacad";
char[] array = input.toCharArray();
char[] modified = new char[array.length];
int a = 0; // index for a
int b = array.length - 1; // index for not a
for (int i = array.length - 1; i >= 0; --i) {
if (array[i] != 'a')
modified[b--] = array[i];
else
modified[a++] = array[i];
}
String output = new String(modified);
System.out.println(output);

sort a string in recursion

I have a certain string and I want to sort it in recursion.
My code is error free but the algorithm is not working and I need help
The index will be zero when calling the function.
The main idea is the compare between indexes in the string and creating a new string each time with the new sequence of the letters compared.
each call I send the new string which was created in each run
private static String sort(String s1, int index)
{
String s2="";
if (index == s1.length()-2)
return s1;
else
{
if (s1.charAt(index) > s1.charAt(index+1))
{
for (int i = 0; i < s1.length(); i++)
{
if (index == i)
{
s2 += s1.charAt(index+1);
s2 += s1.charAt(index);
i += 2;
}
s2 += s1.charAt(i);
}
}
else
{
for (int i = 0; i < s1.length(); i++)
{
if (index == i)
{
s2 += s1.charAt(index);
s2 += s1.charAt(index+1);
i += 2;
}
s2 += s1.charAt(i);
}
}
return (sort(s2,++index));
}
}
input : acbacds
output: abaccds
the output should be : aabccds

Each call compares a pair of adjacent characters; if they're out of order, you switch them.
Your recursion simply replaces an outer loop running through the length of the array.
The end of this process guarantees that the largest value will now be at the end of the array. To this extent, it works correctly.
If you expect an array of N elements to get fully sorted, you must repeat this process up to N-1 times. The only reason your given example is so close is that the array you gave it is already very close to sorted.
Try again with something in reverse order, and you'll see the effect. For instance, use "hgfedcba". One pass will get you "gfedcbah", moving the 'h' from the front to the end.
If you want a working bubble sort, try searching here on SO or on the web overall.
Finally, you might look into the Java substring functions; building s2 a character at a time is hard on the eyes; it's also slow, especially in the case where you don't switch characters.

In finding cumulative sum of a number in java , we should use string or list to store numbers?

In finding cumulative sum of a number , we should use string or Arraylist to store numbers ?
the code I have done is here, I want to know whether there is a more efficient method to do it ?
private static int cumulative_sum(int num) {
int count = 0;
for (int k = 1; k <= num; k++) {
String number = k + "";
for (int i = 0; i < number.length(); i++) {
count = count + Integer.parseInt(number.charAt(i) + "");
}
}
return count;
}

Seems that you want to sum the digits of all the numbers from 1 to num. It's unnecessary and quite ineffective to use strings even for naive algorithm. You can just sequentially divide by 10:
private static int cumulative_sum(int num) {
int count = 0;
for (int k = 1; k <= num; k++) {
int number = k;
while(number > 0) {
count+=number % 10;
number/=10;
}
}
return count;
}
Internally when you create the String it does similar thing, but also allocates memory for the every new String object (and internal char[] array as well). Also you create new String object for every char concatenating it with empty String which is also not very fast. Thus for big num values you may have quite big overhead on allocating/freeing the memory. In contrast my version does not create or use heap objects at all. My simple benchmark shows that, for num = 1000000 my version is 14 times faster (13 ms vs 196 ms).
Note that there are much more effective algorithms to sum all the digits of all numbers from 1 to num. See A037123 for details.

Why is iterative permutation generator slower than recursive?

I'm comparing two functions for use as my permutation generator. This question is about alot of things: the string intern table, the pros and cons of using iteration vs recursion for this problem, etc...
public static List<String> permute1(String input) {
LinkedList<StringBuilder> permutations = new LinkedList<StringBuilder>();
permutations.add(new StringBuilder(""+input.charAt(0)));
for(int i = 1; i < input.length(); i++) {
char c = input.charAt(i);
int size = permutations.size();
for(int k = 0; k < size ; k++) {
StringBuilder permutation = permutations.removeFirst(),
next;
for(int j = 0; j < permutation.length(); j++) {
next = new StringBuilder();
for(int b = 0; b < permutation.length(); next.append(permutation.charAt(b++)));
next.insert(j, c);
permutations.addLast(next);
}
permutation.append(c);
permutations.addLast(permutation);
}
}
List<String> formattedPermutations = new LinkedList<String>();
for(int i = 0; i < permutations.size(); formattedPermutations.add(permutations.get(i++).toString()));
return formattedPermutations;
}
public static List<String> permute2(String str) {
return permute2("", str);
}
private static List<String> permute2(String prefix, String str) {
int n = str.length();
List<String> permutations = new LinkedList<String>();
if (n == 0) permutations.add(prefix);
else
for (int i = 0; i < n; i++)
permutations.addAll(permute2(prefix + str.charAt(i), str.substring(0, i) + str.substring(i+1, n)));
return permutations;
}
I think these two algorithms should be generally equal, however the recursive implementation does well up to n=10, whereas permute1, the interative solution, has an outofmemoryerror at n=8, where n is the input string length. Is the fact that I'm using StringBuilder and then converting to Strings a bad idea? If so, why? I thought whenever you add to a string it creates a new one, which would be bad because then java would intern it, right? So you'd end up with a bunch of intermediate strings that aren't permutations but which are stuck in the intern table.
EDIT:
I replaced StringBuilder with String, which removed the need to use StringBuilder.insert(). However, I do have to use String.substring() to build up the permutation strings, which may not be the best way to do it, but it's empirically better than StringBuilder.insert(). I did not use a char array as Alex Suo suggested because since my method is supposed to return a list of strings, I would have to convert those char arrays into strings which would induce more garbage collection on the char arrays (the reason for the OutOfMemoryError). So with this in place, both the OutOfMemoryError and slowness problems are resolved.
public static List<String> permute3(String input) {
LinkedList<String> permutations = new LinkedList<String>();
permutations.add(""+input.charAt(0));
for(int i = 1; i < input.length(); i++) {
char c = input.charAt(i);
int size = permutations.size();
for(int k = 0; k < size ; k++) {
String permutation = permutations.removeFirst(),
next;
for(int j = 0; j < permutation.length(); j++) {
next = permutation.substring(0, j + 1) + c + permutation.substring(j + 1, permutation.length());
permutations.addLast(next);
}
permutations.addLast(permutation + c);
}
}
return permutations;
}

Firstly, since you got OutOfMemoryError, that hints me you have a lot of GC going on and as we all know, GC is a performance killer. As young-gen GC is stop-the-world, you probably get a lot worse performance by suffering from GCs.
Looking at your code, if you dive into the actual implementation of StringBuilder, you could see that insert() is a very expensive operation involving System.arraycopy() etc and potentially expandCapacity(). Since you don't mention your n for permutation I'd assume the n<10 so you won't have the problem here - you would have memory re-allocation since default buffer of StringBuilder is of size 16 only. StringBuilder is basically an auto char array but it's not magic - whatever you need to do by coding up from scratch, StringBuilder also needs to do it.
Having said the above, if you really want to achieve maximum performance, since the length of your string array is pre-defined, why not just use a char array with length = String.length() ? That's probably the best in term of performance.

More efficient way to find all combinations?

Say you have a List of Strings or whatever, and you want to produce another List which will contain every possible combination of two strings from the original list (concated together), is there any more efficient way to do this other than using a nested for loop to combine the String with all the others?
Some sample code:
for(String s: bytes) {
for(String a: bytes) {
if(!(bytes.indexOf(a) == bytes.indexOf(s))) {
if(s.concat(a).length() == targetLength) {
String combination = s.concat(a);
validSolutions.add(combination);
}
}
}
}
The time for execution gets pretty bad pretty quickly as the size of the original list of Strings grows.
Any more efficient way to do this?

You can avoid checking i != j condition by setting j = i + 1. Also, things like bytes.length() get evaluated at each iteration of both loops - save it into a value and reuse. Calling a.length() inside the loop asks for a length of the same string multiple times - you can save some runtime on that as well. Here are the updates:
int len = bytes.length();
int aLength;
String a, b;
for(int i=0; i<len; i++) {
a = bytes[i];
aLength = a.length();
for(int j=i; j<len; j++) {
b = bytes[j];
if (b.length() + aLength == targetLength) {
validSolutions.add(b.concat(a));
validSolutions.add(a.concat(b));
}
}
}
Edit: j = i because you want to consider a combination of a string with itself; Also, you'd need to add a.concat(b) as well since this combination is never considered in the loop, but is a valid string

You can't get Better than O(N^2), because there are that many combinations. But you could speed up your algorithm a bit (from O(N^3)) by removing the indexOf calls:
for(int i=0; i<bytes.length(); i++) {
for(int j=0; j<bytes.length(); j++) {
string s = bytes[i];
string a = bytes[j];
if (i != j && s.length() + a.length() == targetLength) {
validSolutions.add(s.concat(a));
}
}
}

In addition to what Jimmy and lynxoid say, the fact that the total length is constrained gives you a further optimization. Sort your strings in order of length, then for each s you know that you require only the as such that a.length() == targetLength - s.length().
So as soon as you hit a string longer than that you can break out of the inner loop (since all the rest will be longer), and you can start at the "right" place for example with a lower-bound binary search into the array.
Complexity is still O(n^2), since in the worst case all the strings are the same length, equal to half of totalLength. Typically though it should go somewhat better than considering all pairs of strings.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Calculating space complexity of string compression - Cracking the coding interview - java

Just converting my previous comment into an answer: you are holding a StringBuffer that, potentially, could have a size proportional to the String's one. Just think about the case (the worst one) in which you have an input String with no consecutive, repeated characters.

Related

How to Optimise a given problem time complexity?

sort a string in recursion

In finding cumulative sum of a number in java , we should use string or list to store numbers?

Why is iterative permutation generator slower than recursive?

More efficient way to find all combinations?

Categories

Resources