Long time ago I watched a video lecture from the Princeton Coursera MOOC: Introduction to algorithms, which can be found here. It explains the cost of resizing an ArrayList like structure while adding or removing the elements from it. It turns out that if we want to supply resizing to our data structure we will go from O(n) to amortized O(n) for add and remove operations.
I have been using Java ArrayList for a couple of years. I've been always sure that they grow and shrink automatically. Only recently, to my great surprise, I was proven wrong in this post. Java ArrayLists do not shrink (even though, of course they do grow) automatically.
Here are my questions:
In my opinion providing shrinking in ArrayLists does not make any harm as the performance is already amortized O(n). Why did Java creators did not include this feature into the design?
I know that other data structures like HashMaps also do not shrink automatically. Is there any other data structure in Java which is build on top of arrays that supports automatic shrinking?
What are the tendencies in other languages? How does automatic shrinking look like in case of lists, dictionaries, maps, sets in Python/C# etc. If they go in the opposite direction to what Java does, then my question is: why?
The comments already cover most of what you are asking. Here some thoughts on your questions:
When creating a structure like the ArrayList in Java, the developers make certain decisions regarding runtime / performance. They obviously decided to exclude shrinking from the “normal” operations to avoid the additional runtime, which is needed.
The question is why you would want to shrink automatically. The ArrayList does not grow that much (the factor is about 1.5; newCapacity = oldCapacity + (oldCapacity >> 1), to be exact). Maybe you also insert in the middle and not just append at the end. Then a LinkedList (which is not based on an array -> no shrinking needed) might be better. It really depends on your use case. If you think you really need everything an ArrayList does, but it has to shrink when removing elements (I doubt you really need this), just extend ArrayList and override the methods. But be careful! If you shrink at every removal, you are back at O(n).
The C# List and the C++ vector behave the same concerning shrinking a list at removal of elements. But the factors of automatic growing vary. Even some Java-implementations use different factors.
Another issue with automatic shrinking is that you could get into really horrible 'pathological' situations where each insert and delete to the list causes the growing or shrinking the backing array.
For example, if the backing array's initial capacity is 10, such that the array would grow upon the insertion of the 11th element (capacity would grow to 15), a natural implementation would be to shrink the backing array once the list size dropped below 11. If you had a list whose length kept changing between 10 and 11, you'd be constantly changing the backing array. Not only would that add a runtime overhead, but you could start putting lots of pressure on the garbage collector if every operation resulted in either 10 or 15 objects becoming garbage.
Although arraylist with shrinking is still amortized O(n) time complexity, it involves more operation.
By shrinking you just save some memory space by adding some calculation, which is not a wise decision because the Moore's law says computer space double every 2 years. So the time is more valuable in algorithm than space.
Related
I have an ArrayList filled with 1.5 million objects of some class. When I sort this list by usage of the Collection.sort method the allocated memory of the JVM increases dramatically.
So my questions are:
Is that normal? What could be reasons for that? Is this a matter of the garbage collector working too slowly or not being started often enough? Do the objects in the list have to fulfill certain specifications to consume less memory during sort (besides not containing that much data)?
Thx!
In order to sort a List, the default sorting implementation first creates an array-copy of all elements that are to be sorted. This causes the additional heap consumption that you observe while sorting. This copying is necessary since a generic sorting algorithm has no knowledge of the list's structure, for example if it is random-access or not.
For Java 8, the sorting implementation was however changed to be delegated to each implementation of a List. This became possible with using default methods. For an ArrayList, this additional overhead could be removed by implementing a more efficient sorting algorithm. An upgrade to Java 8 would therefore most likely resolve your problem.
There is nothing wrong with garbage collection for your problem. Large arrays are unfortunately heavy to handle because they probably do not fit into the young generation and can eventually trigger a full collection.
Furthermore, as mentioned in the comments, the actual sorting is performed via Tim Sort since Java 7 by the Arrays::sort implementation. Tim sort requires additional heap space. From the javadoc:
Temporary storage requirements vary from a small constant for nearly sorted
input arrays to n/2 object references for randomly ordered input arrays.
If this is not applicable for your use case, you can switch back to the previous merge-sort implementation by setting the system property java.util.Arrays.useLegacyMergeSort to true.
After all, Tim sort is however still more efficient than merge sort as merge sort requires another full array copy.
In C/C++ we have realloc which will efficiently allocate additional space for an existing collection. I guess it is sub linear (or even constant) in complexity.
Is there a way to achieve the same in Java? Here are the items that I looked at,
Array resize is not possible,
Copying an array to another of bigger size is linear in complexity. Looked at both System.arrayCopy as well as Arrays.copyOf
ArrayList must be same as point 2 above.
Note : My requirement is to expand possibly an extremely large array to even further.
realloc is likely to be O(n) in practice, since it sometimes/often involves memory copying. In that sense, it's equivalent in theoretical complexity to allocating a new array in Java.
Now Java always zeros the newly allocated memory which may take it a bit longer, but OTOH the GC has insanely fast memory allocations so it may even be faster than realloc in some cases. I'd expect a strategy that involves allocating new arrays in Java to be overall roughly comparable in speed to realloc. Possibly Java is better for smaller arrays, C/C++ would have the edge for big arrays, but YMMV. You'd have to benchmark in your particular implementation and workload to be sure.
So overall:
Don't worry about it, just reallocate new arrays in Java
If you do this a lot, be sure to recreate arrays with more space than you need so that you don't need to reallocate with each single element added (this is what the Java ArrayList does internally.
Final but important point: unless you are writing very low level code, you probably shouldn't be worrying about this anyway. Just use one of the fine collection classes that already exist (Java Collections, Google Collections, Trove etc.) and let them handle all of this stuff for you.
I am making a program in Java in which a ball bounces around on the screen. The user can add other balls, and they all bounce off of each other. My question lies in the storage of the added balls. At the moment, I am using an ArrayList to store them, and every time the space bar is pressed, a new ball class is created and added to an Array List. Is this the most efficient way of doing things? I don't specify the size of the Array List at the beginning, so is it inefficient to have to allocate a new space on the array every time the user wants a new ball, even if the ball count will get up in the hundreds? Is there another class I could use to handle this in a more efficient manner?
Thanks!
EDIT:
Sorry, I should have been more clear. I iterate through the balls every 30 milliseconds, using nested for loops to see if they are intersecting with each other. I do access one ball the most often (the ball which the user can control with the arrow keys, another feature of the game), but the user can choose to switch control balls. Balls are never removed. So, I am performing some fairly complex calculations (I use my own vector class to move them off of each other every time there is a collision) on the balls very often.
Measure it and find out! In all seriousness, often times the best way to get answers to these questions is to set up a benchmark and swap in different collection types.
I can tell you that it won't allocate new space every time you add a new item to the ArrayList. Extra space is allocated so that it has room to grow.
LinkedList is another List option. It is super cheap to add items, but random access (list.get(10)) is expensive. Sets could also be good if you don't need ordered access (though there are ordered sets, too), and you want a Map implementation if you're accessing them by some sort of key/id. It really all depends on how you're using the collection.
Update based on added details
It sounds like you are mostly doing sequential reads through the entire list. In that scenario, a LinkedList is probably your best choice. Though again, if you only expose the List interface to the rest of your code (or even a more general Collection), you can easily swap in different implementations and actually measure the difference.
ArrayList is a highly optimized and very efficient wrapper on top of a plain Java array. A small timing overhead comes from copying array elements, which happens when the allocated size is less than required number of elements. When you grow the array into a few hundreds of items, the copying will happen less than ten times, so the price you pay for not knowing the size in advance is very small. You can further reduce that overhead by suggesting an initial size for your ArrayList.
Removing from the middle of the ArrayList does take linear time. If you plan to remove items and/or insert them in the middle of the list frequently, this may become an issue. Note, however, that the overhead is not going to be worse than that for a plain array.
I iterate through the balls every 30 milliseconds, using nested for loops to see if they are intersecting with each other.
This does not have much to do with the collection in which the balls are stored. You could use a spatial index to improve the speed of finding intersections.
About ArrayList in Java, the complexity of remove at the end and add one element is Amortize O(1). Or, you can say, it's almost efficient in most cases. (In some rare cases, it will be awful.)
But you should think more carefully about your design before choosing your data structure.
How many objects often in your collection. If it's small, you can free to choose any data structure that you feel easily to work with. it will almost doesn't lost performance for your code.
If you often find one ball in all of your balls, another datastructure such as HashMap or HashSet would be better.
Or you often delete at middle of your list, maybe LinkedList will be appropriate choice :)
I'd recommend working out the way in which you need to access the balls, and pick an appropriate interface (Not implementation) eg. If you're accessing sequentially only, use a List. If you need to look up the ball by ID, think of a Map. The interface should match your requirements in terms of functionality, not in terms of speed/efficiency.
Then pick an implementation, eg. HashMap or TreeMap, and write your code.
Afterwards, profile it - Is your code inefficient in the ball access code? If so, then try to optimise by switching to an alternate implementation thats more appropriate to your needs.
I have a problem that's been bothering me for some time. I have a software where I generate a certain number of objects in a physical world, storing them in an ArrayList. However, this implementation generates lags when removing objects from the ArrayList.
A static array implementation does not lead to lags, but is less practical as I cannot use "add" and "remove".
I'm assuming the lags with ArrayLists are due to memory freeing and re-allocation. As my ArrayList has a fixed maximal size, is it possible to preallocate it a certain memory, to avoid these issues? Or is there another solution for this?
Thanks a lot in advance for any help!
I normally would not just repost someone elses answer, but this seems to fit very well.
The problem here is that an ArrayList is internally implemented, as the name states, with an array. This means that elements of the collection can't be freely inserted or removed without shifting the elements after the index you are working it.
So if your collection has a lot of elements and you, for example, remove the 5th, then all the elements from 6th to the end of the list must be shifted to the left by one position. This can be expensive indeed and leads to a O(n) complexity.
To avoid these issues, you should choose an appropriate collection according to the most common operations you are going to use on it. A LinkedList can be good if you need to iterate, remove (actually the removal requires to find the element anyway so it's good just if you are already enumerating them) or insert elements but whenever you want to access a specific index you are going into troubles.
You could also look for a HashSet or a TreeSet, they could be suitable to your solution.
In these circumstances knowing how most common data structures work and which are good/bad for is always useful to make appropriate choices.
I have specific requirements for the data structure to be used in my program in Java. It (Data Structure) should be able to hold large amounts of data (not fixed), my main operations would be to add at the end, and delete/read from the beginning (LinkedLists look good soo far). But occasionally, I need to delete from the middle also and this is where LinkedLists are soo painful. Can anyone suggest me a way around this? Or any optimizations through which I can make deletion less painful in LinkedLists?
Thanks for the help!
A LinkedHashMap may suit your purpose
You'd use an iterator to pull stuff from the front
and lookup the entry by key when you needed to access the middle of the list
LinkedList falls down on random accesses. Deletion, without the random access look up, is constant time and so really not too bad for long lists.
ArrayList is generally fast. Inserts and removes from the middle are faster than you might expect because block memory moves are surprisingly fast. Removals and insertions near the start to cause all the following data to be moved down or up.
ArrayDeque is like ArrayList only it uses a circular buffer and has a strange interface.
Usual advice: try it.
you can try using linked list with a pointers after evey 10000th element so that you can reduce the time to find the middle which you wish to delete.
here are some different variations of linked list:
http://experimentgarden.blogspot.com/2009/08/performance-analysis-of-thirty-eight.html
LinkedHashMap is probably the way to go. Great for iteration, deque operations, and seeking into the middle. Costs extra in memory, though, as you'll need to manage a set of keys on top of your basic collection. Plus I think it'll leave 'gaps' in the spaces you've deleted, leading to a non-consecutive set of keys (shouldn't affect iteration, though).
Edit: Aha! I know what you need: A LinkedMultiSet! All the benefit of a LinkedHashMap, but without the superfluous key set. It's only a little more complex to use, though.
First you need to consider whether you will delete from the center of the list often compared to the length of the list. If your list has N items but you delete much less often than 1/N, don't worry about it. Use LinkedList or ArrayDeque as you prefer. (If your lists are occasionally huge and then shrink, but are mostly small, LinkedList is better as it's easy to recover the memory; otherwise, ArrayDeque doesn't need extra objects, so it's a bit faster and more compact--except the underlying array never shrinks.)
If, on the other hand, you delete quite a bit more often than 1/N, then you should consider a LinkedHashSet, which maintains a linked list queue on top of a hash set--but it is a set, so keep in mind that you can't store duplicate elements. This has the overhead of LinkedList and ArrayDeque put together, but if you're doing central deletes often, it'll likely be worth it.
The optimal structure, however--if you really need every last ounce of speed and are willing to spend the coding time to get it--would be a "resizable" array (i.e. reallocated when it was too small) with a circular buffer where you could blank out elements from the middle by setting them to null. (You could also reallocate the buffer when too much was empty if you had a perverse use case then.) I don't advise coding this unless you either really enjoy coding high-performance data structures or have good evidence that this is one of the key bottlenecks in your code and thus you really need it.