I am receiving XML and need to convert to either a primitive Array or ArrayList.
Is there much difference in terms of performance in terms of memory and garbage collection?
My application will be creating thousand of these objects every second and I need to minimize GC as I need real-time performance.
Thxs
Primitive arrays are much more efficient, as they don't require wrapper objects. Guava has List implementations that are backed by primitive arrays (example: Ints.asList(int[])), perhaps that could be a reasonable solution for you: get the power of a collection but only use Objects when you actually need them.
Primitive arrays are always more efficient, but by how much depends on the exact details of your use case. I've recently sped up performance by a factor of 7, by ripping out the ArrayLists, and replacing them with primitive arrays, in the inner-most loops. The use case was an O(n^2) algorithm applied to lists 100-1000 characters long. I then did a controlled experiment, comparing the performance of a int[] array to a ArrayList, and interestingly, as the array/list sizes get bigger, the JIT compiler seems to kick in, and the performance penalty becomes a lot less (only ~20%). But for list sizes less than 500, the performance penalty of an ArrayList can be up to a factor of 10. So if you've got a frequently called method, which is manipulating lots of small lists or arrays (as was with my use case), using primitave arrays can have a big performance impact.
As Sean Patrick Floyd pointed out, primitive arrays are much more efficient.
However, there are cases where one would definitely prefer Collections. But as long as you just iterate over the Objects, there is no need for Collections.
Linked lists are good for inserts/deletes, and arrays are good for random access.
Related
I am concerned with different styles of creating a sorted array from a Java PriorityQueue containing a few thousand elements. The Java 8 docs say
If you need ordered traversal, consider using Arrays.sort(pq.toArray()).
However, I do like the streaming API, so what I had at first was
Something[] elems = theHeap.stream().sorted(BY_CRITERION.reversed())
.toArray(Something[]::new);
(Where BY_CRITERION is the custom comparator of the PriorityQueue, and I do want the reverse order of that.) Is there any disadvantage to using this idiom, as compared to the following:
Something[] elems = theHeap.toArray(new Something[0]);
Arrays.sort(elems, BY_CRITERION.reversed());
This latter code certainly appears to be following the API doc recommendation more directly, but apart from that, is it really more efficient in terms of memory, such as fewer temporary structures being allocated etc.?
I would think that the streaming solution would have to buffer the stream elements in a temporary structure (an array?) and then sort them, and finally copy the sorted elements to the array allocated in toArray().
While the imperative solution would buffer the heap elements in a newly allocated array and then sort them. So this might be one copy operation fewer. (And one array allocation. The discussion of Collection.toArray(new T[size]) vs. Collection.toArray(new T[0]) is tangentially related here. For example, see here for why on OpenJDK the latter is faster.)
And what about sorting efficiency? The docs of Arrays.sort() say
Temporary storage requirements vary from a small constant for nearly sorted input arrays to n/2 object references for randomly ordered input arrays
while the documentation of Stream.sorted() is silent on this point. So at least in terms of being reliably documented, the imperative solution would seem to have an advantage.
But is there anything more to know?
Fundamentally, both variants do the same and since both are valid solutions within the library’s intended use cases, there is no reason why the implementations should prefer one over the other regarding choosing algorithms or adding optimizations.
In practice, this means that the most expensive operation, the sorting, ends up at the same implementation methods internally. The sorted(…) operation of the Stream implementation buffers all elements into an intermediate array, followed by calling Arrays.sort(T[], int, int, Comparator<? super T>), which will delegate to the same method as the method Arrays.sort(T[], Comparator<? super T>) you use in your first variant, the target being a sort method within the internal TimSort class.
So everything that has been said about the time and space complexity for Arrays.sort applies to the Stream.sort as well. But there is a performance difference though. For the OpenJDK implementations up to Java 10, the Stream is not capable of fusing the sorted with the subsequent toArray step, to directly use the result array for the sorting step. So currently, the Stream variant bears a final copying step from the intermediate array used for sorting to the final array created by the function passed to toArray. But future implementations might learn this trick, then, there will be no relevant performance different at all between the two solutions.
This question already has answers here:
Array or List in Java. Which is faster?
(32 answers)
Closed 9 years ago.
Which one is better in performance between Array of type Object and ArrayList of type Object?
Assume we have a Array of Animal objects : Animal animal[] and a arraylist : ArrayList list<Animal>
Now I am doing animal[10] and list.get(10)
which one should be faster and why?
It is pretty obvious that array[10] is faster than array.get(10), as the later internally does the same call, but adds the overhead for the function call plus additional checks.
Modern JITs however will optimize this to a degree, that you rarely have to worry about this, unless you have a very performance critical application and this has been measured to be your bottleneck.
From here:
ArrayList is internally backed by Array in Java, any resize operation
in ArrayList will slow down performance as it involves creating new
Array and copying content from old array to new array.
In terms of performance Array and ArrayList provides similar
performance in terms of constant time for adding or getting element if
you know index. Though automatic resize of ArrayList may slow down
insertion a bit Both Array and ArrayList is core concept of Java and
any serious Java programmer must be familiar with these differences
between Array and ArrayList or in more general Array vs List.
When deciding to use Array or ArrayList, your first instinct really shouldn't be worrying about performance, though they do perform differently. You first concern should be whether or not you know the size of the Array before hand. If you don't, naturally you would go with an array list, just for functionality.
I agree with somebody's recently deleted post that the differences in performance are so small that, with very very few exceptions, (he got dinged for saying never) you should not make your design decision based upon that.
In your example, where the elements are Objects, the performance difference should be minimal.
If you are dealing with a large number of primitives, an array will offer significantly better performance, both in memory and time.
Arrays are better in performance. ArrayList provides additional functionality such as "remove" at the cost of performance.
Of course, I know about the performance difference between arraylist and linkedlist. I have run tests myself and seen the huge difference in time and memory for insertion/deletion and iteration between arraylist and linkedlist for a very big list.
(Correct me if i am wrong)We generally prefer arraylist over linkedlist because:
1)We practically do iterations more often than insertion/deletion. So we prefer iterations to be faster than insertion/deletion.
2)The memory overhead of linkedlist is much more than arraylist
3)There is NO way in which we can define a list as linkedlist while inserting/deleting in batch, and as arraylist while iterating. It is because arraylist and linkedlist have fundamentally different data-storage techniques.
Am I wrong about the 3rd point [I hope so :)]? Is there any possibility to have benefits of these two data structures in a single list? I guess, data structure designers must have thought about it.
If you are looking for some more performant collection implementations, check out Javolution. That package provides a FastList and FastTable which may at least reduce the cost of choosing between linked lists and array lists.
You might want to look into Clojure's "vectors" (which are a lot more than a simple array under the hood): http://blog.higher-order.net/2009/02/01/understanding-clojures-persistentvector-implementation/. They are O(log32 n) for lookup and insertion.
Note that these are directly usable from Java! (Actually, they're implemented in Java code.)
Probably we have another points to consider, but one aspect that make me choose LinkedList over ArrayList is when:
When I don't need to get an element by index (in case of process all elements)
When I don't know the size when creating my list
Here is an interesting manifesto about this topic.
Assume you need to store/retrieve items in a Collection, don't care about ordering, and duplicates are allowed, what type of Collection do you use?
By default, I've always used ArrayList, but I remember reading/hearing somewhere that a Queue implementation may be a better choice. A List allows items to be added/retrieved/removed at arbitrary positions, which incurs a performance penalty. As a Queue does not provide this facility it should in theory be faster when this facility is not required.
I realise that all discussions about performance are somewhat meaningless, the only thing that really matters is measurement. Nevertheless, I'm interested to know what others use for a Collection, when they don't care about ordering, and duplicates are allowed, and why?
"It depends". The question you really need to answer first is "What do I want to use the collection for?"
If you often insert / remove items on one of the ends (beginning, end) a Queue will be better than a ArrayList. However in many cases you create a Collection in order to just read from it. In this case a ArrayList is far more efficient: As it is implemented as an array, you can iterate over it quite efficient (same applies for a LinkedList). However a LinkedList uses references to link single items together. So if you do not need random removals of items (in the middle), a ArrayList is better: An ArrayList will use less memory as the items don't need the storage for the reference to the next/prev item.
To sum it up:
ArrayList = good if you insert once and read often (random access or sequential)
LinkedList = good if you insert/remove often at random positions and read only sequential
ArrayDeque (java6 only) = good if you insert/remove at start/end and read random or sequential
As a default, I tend to prefer LinkedList to ArrayList. Obviously, I use them not through the List interface, but rather through the Collection interface.
Over the time, I've indeed found out that when I need a generic collection, it's more or less to put some things in, then iterate over it. If I need more evolved behaviour (say random access, sorting or unicity checks), I will then maybe change the used implementation, but before that I will change the used interface to the most appropriated. This way, I can ensure feature is provided before to concentrate on optimization and implementation.
ArrayList basicly contains an array inside (that's why it is called ArrayList). And operations like addd/remove at arbitrary positions are done in a straightforward way, so if you don't use them - there is no harm to performance.
If ordering and duplicates is not a problem and case is only for storing,
I use ArrayList, As it implements all the list operations. Never felt any performance issues with these operations (Never impacted my projects either). Actually using these operations have simple usage & I don't need to care how its managed internally.
Only if multiple threads will be accessing this list I use Vector because its methods are synchronized.
Also ArrayList and Vector are collections which you learn first :).
It depends on what you know about it.
If I have no clue, I tend to go for a linked list, since the penalty for adding/removing at the end is constant. If I have a rough idea of the maximum size of it, I go for an arraylist with the capacity specified, because it is faster if the estimation is good. If I really know the exact size I tend to go for a normal array; although that isn't really a collection type.
I realise that all discussions about performance are somewhat meaningless, the only thing that really matters is measurement.
That's not necessarily true.
If your knowledge of how the application is going to work tells you that certain collections are going to be very large, then it is a good idea to pick the right collection type. But the right collection type depends crucially on how the collections are going to be used; i.e. on the algorithms.
For example, if your application is likely to be dominated by testing if a collection holds a given object, the fact that Collection.contains(Object) is O(N) for both LinkedList<T> and ArrayList<T> might mean that neither is an appropriate collection type. Instead, maybe you should represent the collection as a HashMap<T, Integer>, where the Integer represents the number of occurrences of a T in the "collection". That will give you O(1) testing and removal, at the cost of more space overheads and slower (though still O(1)) insertion.
But the thing to stress is that if you are likely to be dealing with really large collections, there should be no such thing as a "default" collection type. You need to think about the collection in the context of the algorithms. (And the flip side is that if the collections are always going to be small, it probably makes little difference which collection type you pick.)
Java 6's Arrays.sort method uses Quicksort for arrays of primitives and merge sort for arrays of objects. I believe that most of time Quicksort is faster than merge sort and costs less memory. My experiments support that, although both algorithms are O(n log(n)). So why are different algorithms used for different types?
The most likely reason: quicksort is not stable, i.e. equal entries can change their relative position during the sort; among other things, this means that if you sort an already sorted array, it may not stay unchanged.
Since primitive types have no identity (there is no way to distinguish two ints with the same value), this does not matter for them. But for reference types, it could cause problems for some applications. Therefore, a stable merge sort is used for those.
OTOH, a reason not to use the (guaranteed n*log(n)) stable merge sort for primitive types might be that it requires making a clone of the array. For reference types, where the referred objects usually take up far more memory than the array of references, this generally does not matter. But for primitive types, cloning the array outright doubles the memory usage.
According to Java 7 API docs cited in this answer, Arrays#Sort() for object arrays now uses TimSort, which is a hybrid of MergeSort and InsertionSort. On the other hand, Arrays#sort() for primitive arrays now uses Dual-Pivot QuickSort. These changes were implemented starting in Java SE 7.
One reason I can think of is that quicksort has a worst case time complexity of O(n^2) while mergesort retains worst case time of O(n log n). For object arrays there is a fair expectation that there will be multiple duplicate object references which is one case where quicksort does worst.
There is a decent visual comparison of various algorithms, pay particular attention to the right-most graph for different algorithms.
I was taking Coursera class on Algorithms and in one of the lectures Professor Bob Sedgewick mentioning the assessment for Java system sort:
"If a programmer is using objects, maybe space is not a critically
important consideration and the extra space used by a merge sort maybe
not a problem. And if a programmer is using primitive types, maybe
the performance is the most important thing so they use quick sort."
java.util.Arrays uses quicksort for primitive types such as int and mergesort for objects that implement Comparable or use a Comparator. The idea of using two different methods is that if a programmer’s using objects maybe space is not a critically important consideration and so the extra space used by mergesort maybe’s not a problem and if the programmer’s using primitive types maybe performance is the most important thing so use the quicksort.
For Example:
This is the example when sorting stability matters.
That’s why stable sorts make sense for object types, especially mutable object types and object types with more data than just the sort key, and mergesort is such a sort. But for primitive types stability is not only irrelevant. It’s meaningless.
Source: INFO
Java's Arrays.sort method uses quicksort, insertion sort and mergesort. There is even both a single and dual pivot quicksort implemented in the OpenJDK code. The fastest sorting algorithm depends on the circumstances and the winners are: insertion sort for small arrays (47 currently chosen), mergesort for mostly sorted arrays, and quicksort for the remaining arrays so Java's Array.sort() tries to choose the best algorithm to apply based on those criteria.