I'm comparing two functions for use as my permutation generator. This question is about alot of things: the string intern table, the pros and cons of using iteration vs recursion for this problem, etc...
public static List<String> permute1(String input) {
LinkedList<StringBuilder> permutations = new LinkedList<StringBuilder>();
permutations.add(new StringBuilder(""+input.charAt(0)));
for(int i = 1; i < input.length(); i++) {
char c = input.charAt(i);
int size = permutations.size();
for(int k = 0; k < size ; k++) {
StringBuilder permutation = permutations.removeFirst(),
next;
for(int j = 0; j < permutation.length(); j++) {
next = new StringBuilder();
for(int b = 0; b < permutation.length(); next.append(permutation.charAt(b++)));
next.insert(j, c);
permutations.addLast(next);
}
permutation.append(c);
permutations.addLast(permutation);
}
}
List<String> formattedPermutations = new LinkedList<String>();
for(int i = 0; i < permutations.size(); formattedPermutations.add(permutations.get(i++).toString()));
return formattedPermutations;
}
public static List<String> permute2(String str) {
return permute2("", str);
}
private static List<String> permute2(String prefix, String str) {
int n = str.length();
List<String> permutations = new LinkedList<String>();
if (n == 0) permutations.add(prefix);
else
for (int i = 0; i < n; i++)
permutations.addAll(permute2(prefix + str.charAt(i), str.substring(0, i) + str.substring(i+1, n)));
return permutations;
}
I think these two algorithms should be generally equal, however the recursive implementation does well up to n=10, whereas permute1, the interative solution, has an outofmemoryerror at n=8, where n is the input string length. Is the fact that I'm using StringBuilder and then converting to Strings a bad idea? If so, why? I thought whenever you add to a string it creates a new one, which would be bad because then java would intern it, right? So you'd end up with a bunch of intermediate strings that aren't permutations but which are stuck in the intern table.
EDIT:
I replaced StringBuilder with String, which removed the need to use StringBuilder.insert(). However, I do have to use String.substring() to build up the permutation strings, which may not be the best way to do it, but it's empirically better than StringBuilder.insert(). I did not use a char array as Alex Suo suggested because since my method is supposed to return a list of strings, I would have to convert those char arrays into strings which would induce more garbage collection on the char arrays (the reason for the OutOfMemoryError). So with this in place, both the OutOfMemoryError and slowness problems are resolved.
public static List<String> permute3(String input) {
LinkedList<String> permutations = new LinkedList<String>();
permutations.add(""+input.charAt(0));
for(int i = 1; i < input.length(); i++) {
char c = input.charAt(i);
int size = permutations.size();
for(int k = 0; k < size ; k++) {
String permutation = permutations.removeFirst(),
next;
for(int j = 0; j < permutation.length(); j++) {
next = permutation.substring(0, j + 1) + c + permutation.substring(j + 1, permutation.length());
permutations.addLast(next);
}
permutations.addLast(permutation + c);
}
}
return permutations;
}
Firstly, since you got OutOfMemoryError, that hints me you have a lot of GC going on and as we all know, GC is a performance killer. As young-gen GC is stop-the-world, you probably get a lot worse performance by suffering from GCs.
Looking at your code, if you dive into the actual implementation of StringBuilder, you could see that insert() is a very expensive operation involving System.arraycopy() etc and potentially expandCapacity(). Since you don't mention your n for permutation I'd assume the n<10 so you won't have the problem here - you would have memory re-allocation since default buffer of StringBuilder is of size 16 only. StringBuilder is basically an auto char array but it's not magic - whatever you need to do by coding up from scratch, StringBuilder also needs to do it.
Having said the above, if you really want to achieve maximum performance, since the length of your string array is pre-defined, why not just use a char array with length = String.length() ? That's probably the best in term of performance.
Related
I have 10,000 items in a set whereby each must be made into triads.
I need an algorithm to efficiently find each triad.
For example:
{A,B,C,D,...}
1.AAA
2.AAB
3.AAC
4.AAD
...
all the way to ZZY, ZZZ.
The method I'm using is very inefficient, I created a nested forloop of 3 which iterates through an array, which has a run-time of O(N^3) and terrible on performance obvious. Which kind of algo and data structure would be better for this? Thank you
Function to print all permutations of K length from a set of n characters with
repetition of characters:
static void printKLengthPerm(char[] set, String prefix, int n, int k)
{
if (k == 0)
{
System.out.println(prefix);
return;
}
for (int i = 0; i < n; i++)
{
String newPrefix = prefix + set[i];
printKLengthPerm(set, newPrefix, n, k - 1);
}
}
Calling the function to print all permutations of 3 length from a set all capital english alphabets:
char[] set = new char[26];
for(int i = 0; i < 26; i++)
set[i] = (char)(i+65);
int n = set.length;
printKLengthPerm(set, "", n, 3);
I have a question which says
Given an input
ababacad
The output should be all the a's should come first together rest characters should follow their sequence as they were originally. i.e.
aaaabbcd
I solved it like below code
String temp="", first="" ;
for(int i=0;i<str.length;i++) {
if(str.charAt(i)!='a')
temp=temp+str.charAt(i);
else
first=first+str.charAt(i);
}
System.out.print(first+temp);
The output matches but it says it is still not optimised. I guess its already order of N complexity. Can it be optimised further.
Optimization here can mean string operations as well as the number of iterations. So, just to be on the safe side, you can implement it using arrays. The time complexity will be O(n) which is the minimum for this problem as you have to check every position for the presence of char 'a'.
String input = "ababacad";
char[] array = input.toCharArray();
char[] modified = new char[array.length];
int a = 0; // index for a
int b = array.length - 1; // index for not a
for (int i = array.length - 1; i >= 0; --i) {
if (array[i] != 'a')
modified[b--] = array[i];
else
modified[a++] = array[i];
}
String output = new String(modified);
System.out.println(output);
I am trying to understand space complexity of the following piece of code. The code compresses Strings from "aabbbb" to "a2b4". The question is Question 5, Chapter 1 from Cracking the coding interview version 5 (2013) and the code is taken from the solutions
public static String compressBetter(String str) {
int size = countCompression(str);
if (size >= str.length()) {
return str;
}
StringBuffer mystr = new StringBuffer();
char last = str.charAt(0);
int count = 1;
for (int i = 1; i < str.length(); i++) {
if (str.charAt(i) == last) {
count++;
} else {
mystr.append(last);
mystr.append(count);
last = str.charAt(i);
count = 1;
}
}
mystr.append(last);
mystr.append(count);
return mystr.toString();
}
where
public static int countCompression(String str) {
if (str == null || str.isEmpty()) return 0;
char last = str.charAt(0);
int size = 0;
int count = 1;
for (int i = 1; i < str.length(); i++) {
if (str.charAt(i) == last) {
count++;
} else {
last = str.charAt(i);
size += 1 + String.valueOf(count).length();
count = 1;
}
}
size += 1 + String.valueOf(count).length();
return size;
}
According to the author compressBetter has O(N) space complexity. Why is that not O(1)?
In every run of countCompression, we hold last, size and count and similar for compressBetter (holds countCompression variables plus mystr, last, count. My understanding of space complexity is "how much memory the algorithm needs/holds at any time". In other words space complexity unlike time complexity is not cumulative.
Note that the author considers in the book only what people refer as "auxiliary space complexity" (without the space needed to store input) as in the example above. Also, afaik there is no entry in the errata of the book on this.
UPDATE: My source of confusion was from the following example (Question 1.1 in the same book)
public static boolean isUniqueChars2(String str) {
boolean[] char_set = new boolean[256];
for (int i = 0; i < str.length(); i++) {
int val = str.charAt(i);
if (char_set[val]) return false;
char_set[val] = true;
}
return true;
}
which is O(1) despite a 256 boolean array allocation - I thought allocations don't matter in calculating space complexity. But in reality it's O(1) because the space needed is constant and independent of the input size (unlike the mystr Stringbuffer).
Just converting my previous comment into an answer: you are holding a StringBuffer that, potentially, could have a size proportional to the String's one. Just think about the case (the worst one) in which you have an input String with no consecutive, repeated characters.
You are asking about the space complexity of compressBetter, which includes a call to countCompression, but also performs additional work.
While the space complexity of countCompression is indeed O(1), compressBetter has linear space complexity (i.e. O(N)) in the worst case (where no two consecutive characters of the input String are equal), since it produces a StringBuffer of 2N characters in that case.
So I'm trying to search through an Arraylist in Java and create a histogram consisting of lengths of string vs frequency that length is present in large text files. I've come up with a brute force algorithm but its much too slow to be of use in large data files. Is there a more efficient way of processing through an Arraylist? I've included the brute force method I came up with.
for (int i = 0; i < (maxLen + 1); i++)
{
int hit = 0;
for (int j = 0; j < list.size(); j++)
{
if (i == list.get(j).length())
++hit;
histogram[i] = hit;
}
}
This is terribly inefficient.
How about instead of looping through each possible length value, then each available word, you simply loop through the available words in the document and count their lengths?
For example:
Map<Integer, Integer> frequencies = new HashMap<Integer, Integer>();
for(int i=0; i<list.size(); i++) {
String thisWord = list.get(i);
Integer theLength = (Integer)(thisWord.length());
if(frequencies.containsKey(theLength) {
frequencies.put(theLength, new Integer(frequencies.get(theLength).intValue()+1));
}
else {
frequencies.put(theLength, new Integer(1));
}
}
Then, if the key does not exist in the HashMap, you know no words of that length exist in the document. If the key does exist, you can look up exactly how many times that occurred.
Note: Some aspects of this code example were made in order to prevent any additional confusion about boxing and unboxing. It is possible to write it slightly cleaner, and I would certainly do so in a production environment. Also, it assumes that you don't have knowledge of any minimum or maximum lengths of words (and is thus slightly more flexible, scalable, and catch-all). Otherwise, the other techniques for simply declaring a primitive array will work just as well (see Jon Skeet's answer).
For a cleaner version that takes advantage of autoboxing:
Map<Integer, Integer> frequencies = new HashMap<Integer, Integer>();
for(int i=0; i<list.size(); i++) {
String thisWord = list.get(i);
if(frequencies.containsKey(thisWord.length()) {
frequencies.put(thisWord.length(), frequencies.get(thisWord.length())+1);
}
else {
frequencies.put(thisWord.length(), 1);
}
}
Why don't you just loop over the list once?
int[] histogram = new int[maxLen + 1]; // All entries will be 0 to start with
for (String text : list) {
if (text.length() <= maxLen) {
histogram[text.length()]++;
}
}
This is now just O(N).
I am trying to find all substrings within a given string. For a random string like rymis the subsequences would be [i, is, m, mi, mis, r, ry, rym, rymi, rymis, s, y, ym, ymi, ymis]. From Wikipedia, a string of a length of n will have n * (n + 1) / 2 total substrings.
Which can be found by doing the following snippet of code:
final Set<String> substring_set = new TreeSet<String>();
final String text = "rymis";
for(int iter = 0; iter < text.length(); iter++)
{
for(int ator = 1; ator <= text.length() - iter; ator++)
{
substring_set.add(text.substring(iter, iter + ator));
}
}
Which works for small String lengths but obviously slows down for larger lengths as the algorithm is near O(n^2).
Also reading up on Suffix Trees which can do insertions in O(n) and noticed the same subsequences could be obtained by repeatedly inserting substrings by removing 1 character from the right until the string is empty. Which should be about O(1 + … + (n-1) + n) which is a summation of n -> n(n+1)/2 -> (n^2 + n)/ 2, which again is near O(n^2). Although there seems to be some Suffix Trees that can do insertions in log2(n) time which would be a factor better being O(n log2(n)).
Before I delve into Suffix Trees is this the correct route to be taking, is there some another algorithm that would be more efficient for this, or is O(n^2) as good as this will get?
I am fairly sure you can't beat O(n^2) for this as has been mentioned in comments to the question.
I was interested in different ways of coding that so I made one quickly, and I decided to post it here.
The solution I put here isn't asymptotically faster I don't think, but when counting the inner and outer loops there are less. There are also less duplicate insertions here - no duplicate insertions.
String str = "rymis";
ArrayList<String> subs = new ArrayList<String>();
while (str.length() > 0) {
subs.add(str);
for (int i=1;i<str.length();i++) {
subs.add(str.substring(i));
subs.add(str.substring(0,i));
}
str = str.substring(1, Math.max(str.length()-1, 1));
}
This is an inverted way of your example, but still o(n^2).
string s = "rymis";
ArrayList<string> al = new ArrayList<string>();
for(int i = 1; i < s.length(); i++){//collect substrings of length i
for(int k = 0; k < s.length(); k++){//start index for sbstr len i
if(i + k > s.length())break;//if the sbstr len i runs over end of s move on
al.add(s.substring(k, k + i));//add sbstr len i at index k to al
}
}
Let me see if I can post a recursive example. I started doing a couple recursive tries and came up with this iterative approach using dual sliding windows as a sort of improvement to the above method. I had a recursive example in mind but was having issues reducing the tree size.
string s = "rymis";
ArrayList<string> al = new ArrayList<string>();
for(int i = 1; i < s.length() + 1; i ++)
{
for(int k = 0; k < s.length(); k++)
{
int a = k;//left bound window 1
int b = k + i;//right bound window 1
int c = s.length() - 1 - k - i;//left bound window 2
int d = s.length() - 1 - k;//right bound window 2
al.add(s.substring(a,b));//add window 1
if(a < c)al.add(s.substring(c,d));//add window 2
}
}
There was an issue mentioned with using arraylist affecting performance so this next one will be with more basic structures.
string s = "rymis";
StringBuilder sb = new StringBuilder();
for(int i = 1; i < s.length() + 1; i ++)
{
for(int k = 0; k < s.length(); k++)
{
int a = k;//left bound window 1
int b = k + i;//right bound window 1
int c = s.length() - 1 - k - i;//left bound window 2
int d = s.length() - 1 - k;//right bound window 2
if(i > 1 && k > 0)sb.append(",");
sb.append(s.substring(a,b));//add window 1
if(a < c){
sb.append(",");
sb.append(s.substring(c,d));//add window 2
}
}
}
string s = sb.toString();
String[] sArray = s.split("\\,");
I am not sure about the exact algorithm but you may look into Ropes:
http://en.wikipedia.org/wiki/Rope_(computer_science)
In summary, ropes are better suited when the data is large and frequently modified.
I believe Rope outperforms String for your problem.