What is the time complexity of StringBuilder.append() in java? - java

A program I'm working on converts an array of integers to a string using string builder. I'm trying to determine the time complexity of this approach.

Check out: https://stackoverflow.com/a/7156703/7294647
Basically, it's not clear what the time complexity is for StringBuilder#append as it depends on its implementation, so you shouldn't have to worry about it.
There might be a more efficient way of approaching your int[]-String conversion depending on what you're actually trying to achieve.

If the StringBuilder needs to increase its capacity, that involves copying the entire character array to a new array. You can avoid this by initially setting the capacity so it won't have to do this. (This should be easy since you know the length of the int array and the maximum number of characters in the String representation of an int.)
If you avoid the need to increase the capacity, the complexity would seem to just be O(n). When you append, you're just copying the character array from the String to the end of the character array in the StringBuilder.
(Yes, it depends on the implementation, but it would be a rather poor implementation of StringBuilder if it couldn't append in O(n) time.)

Related

Most efficient way to search integer array in Java

I'm looking for the most efficient way to determine whether a specific value exists in a small (16 element) array of integers in Java. The array is unsorted.
Options:
A boring but reliable for loop
Sort the array then Arrays.binarySearch(arr, targetVal)
List.contains method - example Arrays.asList(arr).contains(targetVal)
Something else.
Option 3 must have some overhead in "converting" to a List but I could use a List throughout rather than an array if that would be better overall. I've no feel for how List performs speed wise.
Based on condition that the array is unsorted any search on it will have complexity O(n).
You can try use your second assumption. In that case you will have O(n*log(n)) + O(log(n))
But if you have such small array and you want to search only once better to use a simple loop. Because it hard to predict what time will be elapsed for conversion to List or what type of sorting algorithm will you use and etc.
Just a loop will be a good choice
FYI: Stream will not be efficient at your case.

Question(s) about time complexity of array "resizing" in Java

NOTE: As the title already hints, this question is not about the specific java.util.ArrayList implementation of an array-based list, but rather about the raw arrays themselves and how they might behave in a "pure" (meaning completely unoptimized) array-based list implementation. I chose to mention java.util.ArrayList because it is the most prominent example of an array-based list in Java, although it is technically not "pure", as it utilizes preallocation to reduce the operation time of add(). If you want to know why I am asking this specific question without being interested in the java.util.ArrayList() preallocation optimization, I added a little explanation of my use case below.
It is generally known that you can access elements in array-based lists (like Java's ArrayList<E>) with a time complexity of O(1), while adding elements to that list will take O(n). With linked lists, it is the other way round (for a doubly linked list, you could optimize the access to half the execution time).
The reason why adding elements to an array-based list takes O(n) is that an array cannot simply be resized, but has to be reallocated and re-filled. The easiest way to do this would be:
String arr[] = new String[n];
//...
String newElem = "foo";
String[] newArr = new String[n + 1];
int i = 0;
for (String elem : arr) {
newArr[i] = arr[i++];
}
newArr[i] = newElem;
arr = newArr;
The time complexity O(n) is clearly visible thanks to the for loop. But there are other ways to copy arrays in Java, for example System.arraycopy().
Sticking to the vanilla for loop solution, even shrinking an array will take O(n), because an array has a fixed size and in order to "shrink" it, you'd have to copy all elements to be retained to a new, smaller array.
So, here are my questions concerning such array operations and their time complexity:
While the vanilla for loop will always take O(n), is it possible that System.arraycopy() optimizes the "add" operation if there is enough space in the memory to expand the array in place, meaning that it would leave the original array at its place and just add the new element at the end of it?
As the shrinking operation could always be executed with O(1) in theory, does System.arraycopy() always optimize this operation to O(1)?
If System.arraycopy() is not capable of using those optimizations, is there any other way in Java to actually utilize those optimizations which are possible in theory OR will array "resizing" always take O(n), no matter under which circumstances?
TL;DR is there any situation in which the "resizing" of an array in Java will take less than O(n)?
Additional information:
I am using openJDK11 (newest release), but if the answer turns out to be JVM-dependent, I'd like to know how other JVMs would behave in comparison.
For the curious ones
who want to know what I want to do with this information:
I am working on a new java.util.List implementation, namely a hybrid list that can store data in an array and in a linked buffer. On certain occasions, the buffer will be flushed into the array, which of course requires that the existing array is resized. But apart from this idea, I want to utilize as many other optimizations on the array part as possible. To avoid array resizing in general, I experimented with the idea of letting the array persist in a constant size, but managing the "valid" range of it with some other fields. Meaning that if you were to pop the last element of the array, it would not shrink the array but rather the range of valid elements. Then, when inserting new elements in the array part, the former invalid section can be used to shift values into, basically reusing the space that was formerly used by a now deleted element. If the inserting operations exceed the actual array size, elements can still be transferred to the linked buffer to avoid resizing. To further optimize this, I chose to use the middle of the array as a pivot when deleting certain elements. Now the valid range might not start at the beginning of the array anymore. Basically this means if you delete an element to the left of the pivot, all elements between the start of the valid range and the deleted element get shifted towards the pivot, to the right. Removing element to the right of the pivot works accordingly. So, after some removals, the array could look like this:
[null null|elem0 elem1 elem2||elem3 elem4 elem5|null null null]
(Where the | at the beginning and at the end mark the valid range and the || marks the pivot)
So, how is this all related to my question?
All of those optimizations build up upon the claim that array resizing is expensive in time, namely O(n). Therefore array resizing is avoided whenever possible. Those optimizations might sound neat, but the code implementing them can get quite messy, especially when implementing the batch operations (addAll(), removeAll(), retainAll()...). So, if it turns out that the array resizing operation itself can be less expensive in some cases (especially shrinking), I would cut out a lot of those optimizations which are then rendered useless, making the code a lot easier in the process.
So, before sticking to my optimization ideas and experiments, I'd like to know whether they are even needed.

Appending Strings vs appending chars in Java

I am trying to solve an algorithmic task where speed is of primary importance. In the algorithm, I am using a DFS search in a graph and in every step, I add a char and a String. I am not sure whether this is the bottleneck of my algorithm (probably not) but I am curious what is the fastest and most efficient way to do this.
At the moment, I use this:
transPred.symbol + word
I think that there is might be a better alternative than the "+" operator but most String methods only work with other Strings (would converting my char into String and using one of them make a difference?).
Thanks for answers.
EDIT:
for (Transition transPred : state.transtitionsPred) {
walk(someParameters, transPred.symbol + word);
}
transPred.symbol is a char and word is a string
A very common problem / concern.
Bear in mind that each String in java is immutable. Thus, if you modify the string it actually creates a new object. This results in one new object for each concatenation you're doing above. This isn't great, as it's simply creating garbage that will have to be collected at some point.
If your graph is overly large, this might be during your traversal logic - and it may slow down your algorithm.
To avoid creating a new String for each concatenation, use the StringBuilder. You can declare one outside your loop and then append each character with StringBuilder.append(char). This does not incur a new object creation for each append() operation.
After your loop you can use StringBuilder.toString(), this will create a new object (the String) but it will only be one for your entire loop.
Since you replace one char in the string at each iteration I don't think that there is anything faster than a simple + append operation. As mentioned, Strings are immutable, so when you append a char to it, you will get a new String object, but this seems to be unavoidable in your case since you need a new string at each iteration.
If you really want to optimize this part, consider using something mutable like an array of chars. This would allow you to replace the first character without any excessive object creation.
Also, I think you're right when you say that this probably isn't your bottleneck. And remember that premature optimization is the root of all evil etc. (Don't mind the irony that the most popular example of good optimization is avoiding excessive string concatenation).

Why is String class in java implemented using char[], offset and length?

Why does String class in java have char[] value, int offset and int count fields. What is their purpose and what task do they accomplish?
The char[] array holds the array of characters making up that string.
The offset and count are used for the String.substring() operation. When you take a substring of a string the resultant String references the original character array, but stores an associated offset and length (this is known as a flyweight pattern and is a commonly used technique to save memory)
e.g. String.substring("ABCDEF", 1, 2);
would reference the original array of A,B,C,D,E,F but set offset to 1 and length to 1 (since the substring method uses start and end indices). Note you can do this trivially since the character array is immutable. You can't change it.
Note: This has changed recently (7u6, I believe) and is no longer true in recent versions. I suspect this is due to the realisation that this optimisation isn't really used much.
They allow passing back and forth an array as a backing for routines that are primarily interested in a portions of the array. This allows one to not worry about constructing tons of small arrays, avoiding the costs associated with object construction for particular operations.
For example, one might use an array as the input buffer, but then need additional arrays to handle the chunked up characters within that buffer, with the triple arguments of array, offset, and count, you can "simulate" reading from the middle of the buffer without the need to create a new (secondary) array.
This is important, as while you might reasonably want an array (an object in java) to hold the input characters, you probably don't want to allocate and garbage collect thousands of arrays (and copy the characters into them) to pass the data into something that only expects a single word, as delimited by white space (hey, it's just an example).

Java Memory Saving Techniques?

I have this 4 Dimensional array to store String values which are used to create a map which is then displayed on screen with the paintComponent. I have read many articles saying that using huge arrays is very inefficient. (Especially since the array dimensions are 16x16x3x3) I was wondering if there was any way to store the string values (I use them as ID values) differently to save memory or reduce fetching time. If you have any ideas or methods I would appreciate it. Thanks!
Well, if your matrix is full, that is every element contains data, then I believe an array is optimally efficient. But if your matrix is sparse, you could look into using more Linking-based data types.
the first thing I would do is to not use strings as IDs, use ints. It'll reduce the size of your structure a lot.
Also, that array really isn't that big, I wouldn't worry about efficiency if that's the only data structure you have. It's only 2304 elements large.
First off, 16*16*3*3 = 2304 - quite modest really. At this size I'd be more worried about the confusion likely to be caused by a 4D array than the size it is taking!
As others have said, if it fully populated, arrays are ok. If it has gaps, an ArrayList or similar would be better.
If the Strings are just IDs, why not store an enum (or even Integers) instead of a string?
Keep in mind that the String values are separate from the array. The array itself takes the same memory space regardless of what string values it links to. Accessing a specific address in your array will take the same amount of time regardless of what type of object you have saved there, or what the value of that object is.
However, if you find that many of your string values represent exactly the same string, you can avoid having multiple copies of the same string by leveraging String.intern(). If you store the interned string, and you don't have any other references to the non-interned string, that frees the non-interned string up to be garbage-collected. Your array will then have multiple entries that point to the same memory space, rather than different memory addresses with equivalent string objects.
See also:
Is it good practice to use java.lang.String.intern()?
http://www.codeinstructions.com/2009/01/busting-javalangstringintern-myths.html
Depending on the requirements of your IDs, you may also want to look into using a different data structure than strings. For example, while the array itself would be the same size, storing int values would avoid the need to allocate extra space for each individual entry.
Also, a 4-dimensional array may not be the best data structure for your needs in the first place. Can you describe why you've chosen this data structure for what you're trying to represent?
The Strings only take up the space of a reference in each array element. There could be a savings if the strings come from a very small set of values. A more important question is are your 4-dimensional arrays sparse or mostly filled? If you have very few values actually specified then you might have a big savings replacing the 4-d array with a Map from the indicies to the String. Let me know if you want a code sample.
Do you actually have a 4D array of 16x16x3x3 (i.e. 2k) string objects? That doesn't sound that big to me. An array is the most efficient way to store a collection of objects, in terms of memory. An ArrayList can be slightly less efficient (up to 50% wasted space).
The only other way I can think of is to store the Strings end-to-end in one giant String and then use substring() to get the bit you need from that, but you would still need to store the indexes somewhere.
Are you running out of memory? If so check that the Strings in your array are the size you think they are - the backing array of a String instance in Java can be much larger than the string itself. If you do subString() on a 1 GB string, the returned string instance shares the 1 GB array of the first string so will keep it from being GC'd longer than you might expect.

Categories

Resources