Related
Java 8 provides java.util.Arrays.parallelSort, which sorts arrays in parallel using the fork-join framework. But there's no corresponding Collections.parallelSort for sorting lists.
I can use toArray, sort that array, and store the result back in my list, but that will temporarily increase memory usage, which if I'm using parallel sorting is already high because parallel sorting only pays off for huge lists. Instead of twice the memory (the list plus parallelSort's working memory), I'm using thrice (the list, the temporary array and parallelSort's working memory). (Arrays.parallelSort documentation says "The algorithm requires a working space no greater than the size of the original array".)
Memory usage aside, Collections.parallelSort would also be more convenient for what seems like a reasonably common operation. (I tend not to use arrays directly, so I'd certainly use it more often than Arrays.parallelSort.)
The library can test for RandomAccess to avoid trying to e.g. quicksort a linked list, so that can't a reason for a deliberate omission.
How can I sort a List in parallel without creating a temporary array?
There doesn't appear to be any straightforward way to sort a List in parallel in Java 8. I don't think this is fundamentally difficult; it looks more like an oversight to me.
The difficulty with a hypothetical Collections.parallelSort(list, cmp) is that the Collections implementation knows nothing about the list's implementation or its internal organization. This can be seen by examining the Java 7 implementation of Collections.sort(list, cmp). As you observed, it has to copy the list elements out to an array, sort them, and then copy them back into the list.
This is the big advantage of the List.sort(cmp) extension method over Collections.sort(list, cmp). It might seem that this is merely a small syntactic advantage being able to write myList.sort(cmp) instead of Collections.sort(myList, cmp). The difference is that myList.sort(cmp), being an interface extension method, can be overridden by the specific List implementation. For example, ArrayList.sort(cmp) sorts the list in-place using Arrays.sort() whereas the default implementation implements the old copyout-sort-copyback technique.
It should be possible to add a parallelSort extension method to the List interface that has similar semantics to List.sort but does the sorting in parallel. This would allow ArrayList to do a straightforward in-place sort using Arrays.parallelSort. (It's not entirely clear to me what the default implementation should do. It might still be worth it to do copyout-parallelSort-copyback.) Since this would be an API change, it can't happen until the next major release of Java SE.
As for a Java 8 solution, there are a couple workarounds, none very pretty (as is typical of workarounds). You could create your own array-based List implementation and override sort() to sort in parallel. Or you could subclass ArrayList, override sort(), grab the elementData array via reflection and call parallelSort() on it. Of course you could just write your own List implementation and provide a parallelSort() method, but the advantage of overriding List.sort() is that this works on the plain List interface and you don't have to modify all the code in your code base to use a different List subclass.
I think you are doomed to use a custom List implementation augmented with your own parallelSort or else change all your other code to store the big data in Array types.
This is the inherent problem with layers of abstract data types. They're meant to isolate the programmer from details of implementation. But when the details of implementation matter - as in the case of underlying storage model for sort - the otherwise splendid isolation leaves the programmer helpless.
The standard List sort documents provide an example. After the explanation that mergesort is used, they say
The default implementation obtains an array containing all elements in this list, sorts the array, and iterates over this list resetting each element from the corresponding position in the array. (This avoids the n2 log(n) performance that would result from attempting to sort a linked list in place.)
In other words, "since we don't know the underlying storage model for a List and couldn't touch it if we did, we make a copy organized in a known way." The parenthesized expression is based on the fact that the List "i'th element accessor" on a linked list is Omega(n), so the normal array mergesort implemented with it would be a disaster. In fact it's easy to implement mergesort efficiently on linked lists. The List implementer is just prevented from doing it.
A parallel sort on List has the same problem. The standard sequential sort fixes it with custom sorts in the concrete List implementations. The Java folks just haven't chosen to go there yet. Maybe in Java 9.
Use the following:
yourCollection.parallelStream().sorted().collect(Collectors.toList());
This will be parallel when sorting, because of parallelStream(). I believe this is what you mean by parallel sort?
Just speculating here, but I see several good reasons for generic sort algorithms preferring to work on arrays instead of List instances:
Element access is performed via method calls. Despite all the optimizations JIT can apply, even for a list that implements RandomAccess, this probably means a lot of overhead compared to plain array accesses which can be optimized very well.
Many algorithms require copying some fragments of the array to temporary structures. There are efficient methods for copying arrays or their fragments. An arbitrary List instance on the other hand, can't be easily copied. New lists would have to be allocated which poses two problems. First, this means allocating some new objects which is likely more costly than allocating arrays. Second, the algorithm would have to choose what implementation of List should be allocated for this temporary structure. There are two obvious solutions, both bad: either just choose some hard-coded implementation, e.g. ArrayList, but then it could just allocate simple arrays as well (and if we're generating arrays then it's much easier if the soiurce is also an array). Or, let the user provide some list factory object, which makes the code much more complicated.
Related to the previous issue: there is no obvious way of copying a list into another due to how the API is designed. The best the List interface offers is addAll() method, but this is probably not efficient for most cases (think of pre-allocating the new list to its target size vs adding elements one by one which many implementations do).
Most lists that need to be sorted will be small enough for another copy to not be an issue.
So probably the designers thought of CPU efficiency and code simplicity most of all, and this is easily achieved when the API accepts arrays. Some languages, e.g. Scala, have sort methods that work directly on lists, but this comes at a cost and probably is less efficient than sorting arrays in many cases (or sometimes there will probably just be a conversion to and from array performed behind the scenes).
By combining the existing answers I came up with this code.
This works if you are not interested in creating a custom List class and if you don't bother to create a temporary array (Collections.sort is doing it anyway).
This uses the initial list and does not create a new one as in the parallelStream solution.
// Convert List to Array so we can use Arrays.parallelSort rather than Collections.sort.
// Note that Collections.sort begins with this very same conversion, so we're not adding overhead
// in comparaison with Collections.sort.
Foo[] fooArr = fooLst.toArray(new Foo[0]);
// Multithread the TimSort. Automatically fallback to mono-thread when size is less than 8192.
Arrays.parallelSort(fooArr, Comparator.comparingStuff(Foo::yourmethod));
// Refill the List using the sorted Array, the same way Collections.sort does it.
ListIterator<Foo> i = fooLst.listIterator();
for (Foo e : fooArr) {
i.next();
i.set(e);
}
In my program I often use collections to store lists of objects. Currently I use ArrayList to store objects.
My question is: is this a best choice? May be its better to use LinkedList? Or something else?
Criteria to consider are:
Memory usage
Performance
Operations which I need are:
Add element to collection
Iterate through the elements
Any thoughts?
Update: my choice is : ArrayList :) Basing on this discussion as well as the following ones:
When to use LinkedList over ArrayList?
List implementations: does LinkedList really perform so poorly vs. ArrayList and TreeList?
I always default to ArrayList, and would in your case as well, except when
I need thread safety (in which case I start looking at List implementations in java.util.concurrent)
I know I'm going to be doing lots of insertion and manipulation to the List or profiling reveals my usage of an ArrayList to be a problem (very rare)
As to what to pick in that second case, this SO.com thread has some useful insights: List implementations: does LinkedList really perform so poorly vs. ArrayList and TreeList?
I know I'm late but, maybe, this page can help you, not only now, but in the future...
Linked list is faster for adding/removing inside elements (ie not head or tail)
Arraylist is faster for iterating
It's a classic tradeoff between insert vs. retrieve optimization. Common choice for the task as you describe it is the ArrayList.
ArrayList is fine for your (and most other) purposes. It has a very small memory overhead and has good amortized performance for most operations. The cases where it is not ideal are relatively rare:
The list ist very large
You frequently need to do one of these operations:
Add/remove items during iteration
Remove items from the beginning of the list
If you're only adding at the end of the list, ArrayList should be ok. From the documentation of ArrayList:
The details of the growth policy are not specified beyond the fact that adding an element has constant amortized time cost
and ArrayList should also use less memory than a linked list as you don't need to use space for the links.
It depends on your usage profile.
Do you add to the end of the list? Both are fine for this.
Do you add to the start of the list? LinkedList is better for this.
Do you require random access (will you ever call get(n) on it)? ArrayList is better for this.
Both are good at iterating, both Iterator implementations are O(1) for next().
If in doubt, test your own app with each implementation and make your own choice.
Given your criteria, you should be using the LinkedList.
LinkedList implements the Deque interface which means that it can add to the start or end of the list in constant time (1). In addition, both the ArrayList and LinkedList will iterate in (N) time.
You should NOT use the ArrayList simply because the cost of adding an element when the list is full. In this case, adding the element would be (N) because of the new array being created and copying all elements from one array to the other.
Also, the ArrayList will take up more memory because the size of your backing array might not be completely filled.
Assume you need to store/retrieve items in a Collection, don't care about ordering, and duplicates are allowed, what type of Collection do you use?
By default, I've always used ArrayList, but I remember reading/hearing somewhere that a Queue implementation may be a better choice. A List allows items to be added/retrieved/removed at arbitrary positions, which incurs a performance penalty. As a Queue does not provide this facility it should in theory be faster when this facility is not required.
I realise that all discussions about performance are somewhat meaningless, the only thing that really matters is measurement. Nevertheless, I'm interested to know what others use for a Collection, when they don't care about ordering, and duplicates are allowed, and why?
"It depends". The question you really need to answer first is "What do I want to use the collection for?"
If you often insert / remove items on one of the ends (beginning, end) a Queue will be better than a ArrayList. However in many cases you create a Collection in order to just read from it. In this case a ArrayList is far more efficient: As it is implemented as an array, you can iterate over it quite efficient (same applies for a LinkedList). However a LinkedList uses references to link single items together. So if you do not need random removals of items (in the middle), a ArrayList is better: An ArrayList will use less memory as the items don't need the storage for the reference to the next/prev item.
To sum it up:
ArrayList = good if you insert once and read often (random access or sequential)
LinkedList = good if you insert/remove often at random positions and read only sequential
ArrayDeque (java6 only) = good if you insert/remove at start/end and read random or sequential
As a default, I tend to prefer LinkedList to ArrayList. Obviously, I use them not through the List interface, but rather through the Collection interface.
Over the time, I've indeed found out that when I need a generic collection, it's more or less to put some things in, then iterate over it. If I need more evolved behaviour (say random access, sorting or unicity checks), I will then maybe change the used implementation, but before that I will change the used interface to the most appropriated. This way, I can ensure feature is provided before to concentrate on optimization and implementation.
ArrayList basicly contains an array inside (that's why it is called ArrayList). And operations like addd/remove at arbitrary positions are done in a straightforward way, so if you don't use them - there is no harm to performance.
If ordering and duplicates is not a problem and case is only for storing,
I use ArrayList, As it implements all the list operations. Never felt any performance issues with these operations (Never impacted my projects either). Actually using these operations have simple usage & I don't need to care how its managed internally.
Only if multiple threads will be accessing this list I use Vector because its methods are synchronized.
Also ArrayList and Vector are collections which you learn first :).
It depends on what you know about it.
If I have no clue, I tend to go for a linked list, since the penalty for adding/removing at the end is constant. If I have a rough idea of the maximum size of it, I go for an arraylist with the capacity specified, because it is faster if the estimation is good. If I really know the exact size I tend to go for a normal array; although that isn't really a collection type.
I realise that all discussions about performance are somewhat meaningless, the only thing that really matters is measurement.
That's not necessarily true.
If your knowledge of how the application is going to work tells you that certain collections are going to be very large, then it is a good idea to pick the right collection type. But the right collection type depends crucially on how the collections are going to be used; i.e. on the algorithms.
For example, if your application is likely to be dominated by testing if a collection holds a given object, the fact that Collection.contains(Object) is O(N) for both LinkedList<T> and ArrayList<T> might mean that neither is an appropriate collection type. Instead, maybe you should represent the collection as a HashMap<T, Integer>, where the Integer represents the number of occurrences of a T in the "collection". That will give you O(1) testing and removal, at the cost of more space overheads and slower (though still O(1)) insertion.
But the thing to stress is that if you are likely to be dealing with really large collections, there should be no such thing as a "default" collection type. You need to think about the collection in the context of the algorithms. (And the flip side is that if the collections are always going to be small, it probably makes little difference which collection type you pick.)
What is the need of Collection framework in Java since all the data operations(sorting/adding/deleting) are possible with Arrays and moreover array is suitable for memory consumption and performance is also better compared with Collections.
Can anyone point me a real time data oriented example which shows the difference in both(array/Collections) of these implementations.
Arrays are not resizable.
Java Collections Framework provides lots of different useful data types, such as linked lists (allows insertion anywhere in constant time), resizeable array lists (like Vector but cooler), red-black trees, hash-based maps (like Hashtable but cooler).
Java Collections Framework provides abstractions, so you can refer to a list as a List, whether backed by an array list or a linked list; and you can refer to a map/dictionary as a Map, whether backed by a red-black tree or a hashtable.
In other words, Java Collections Framework allows you to use the right data structure, because one size does not fit all.
Several reasons:
Java's collection classes provides a higher level interface than arrays.
Arrays have a fixed size. Collections (see ArrayList) have a flexible size.
Efficiently implementing a complicated data structures (e.g., hash tables) on top of raw arrays is a demanding task. The standard HashMap gives you that for free.
There are different implementation you can choose from for the same set of services: ArrayList vs. LinkedList, HashMap vs. TreeMap, synchronized, etc.
Finally, arrays allow covariance: setting an element of an array is not guaranteed to succeed due to typing errors that are detectable only at run time. Generics prevent this problem in arrays.
Take a look at this fragment that illustrates the covariance problem:
String[] strings = new String[10];
Object[] objects = strings;
objects[0] = new Date(); // <- ArrayStoreException: java.util.Date
Collection classes like Set, List, and Map implementations are closer to the "problem space." They allow developers to complete work more quickly and turn in more readable/maintainable code.
For each class in the Collections API there's a different answer to your question. Here are a few examples.
LinkedList: If you remove an element from the middle of an array, you pay the cost of moving all of the elements to the right of the removed element. Not so with a linked list.
Set: If you try to implement a set with an array, adding an element or testing for an element's presence is O(N). With a HashSet, it's O(1).
Map: To implement a map using an array would give the same performance characteristics as your putative array implementation of a set.
It depends upon your application's needs. There are so many types of collections, including:
HashSet
ArrayList
HashMap
TreeSet
TreeMap
LinkedList
So for example, if you need to store key/value pairs, you will have to write a lot of custom code if it will be based off an array - whereas the Hash* collections should just work out of the box. As always, pick the right tool for the job.
Well the basic premise is "wrong" since Java included the Dictionary class since before interfaces existed in the language...
collections offer Lists which are somewhat similar to arrays, but they offer many more things that are not. I'll assume you were just talking about List (and even Set) and leave Map out of it.
Yes, it is possible to get the same functionality as List and Set with an array, however there is a lot of work involved. The whole point of a library is that users do not have to "roll their own" implementations of common things.
Once you have a single implementation that everyone uses it is easier to justify spending resources optimizing it as well. That means when the standard collections are sped up or have their memory footprint reduced that all applications using them get the improvements for free.
A single interface for each thing also simplifies every developers learning curve - there are not umpteen different ways of doing the same thing.
If you wanted to have an array that grows over time you would probably not put the growth code all over your classes, but would instead write a single utility method to do that. Same for deletion and insertion etc...
Also, arrays are not well suited to insertion/deletion, especially when you expect that the .length member is supposed to reflect the actual number of contents, so you would spend a huge amount of time growing and shrinking the array. Arrays are also not well suited for Sets as you would have to iterate over the entire array each time you wanted to do an insertion to check for duplicates. That would kill any perceived efficiency.
Arrays are not efficient always. What if you need something like LinkedList? Looks like you need to learn some data structure : http://en.wikipedia.org/wiki/List_of_data_structures
Java Collections came up with different functionality,usability and convenience.
When in an application we want to work on group of Objects, Only ARRAY can not help us,Or rather they might leads to do things with some cumbersome operations.
One important difference, is one of usability and convenience, especially given that Collections automatically expand in size when needed:
Collections came up with methods to simplify our work.
Each one has a unique feature:
List- Essentially a variable-size array;
You can usually add/remove items at any arbitrary position;
The order of the items is well defined (i.e. you can say what position a given item goes in in the list).
Used- Most cases where you just need to store or iterate through a "bunch of things" and later iterate through them.
Set- Things can be "there or not"— when you add items to a set, there's no notion of how many times the item was added, and usually no notion of ordering.
Used- Remembering "which items you've already processed", e.g. when doing a web crawl;
Making other yes-no decisions about an item, e.g. "is the item a word of English", "is the item in the database?" , "is the item in this category?" etc.
Here you find use of each collection as per scenario:
Collection is the framework in Java and you know that framework is very easy to use rather than implementing and then use it and your concern is that why we don't use the array there are drawbacks of array like it is static you have to define the size of row at least in beginning, so if your array is large then it would result primarily in wastage of large memory.
So you can prefer ArrayList over it which is inside the collection hierarchy.
Complexity is other issue like you want to insert in array then you have to trace it upto define index so over it you can use LinkedList all functions are implemented only you need to use and became your code less complex and you can read there are various advantages of collection hierarchy.
Collection framework are much higher level compared to Arrays and provides important interfaces and classes that by using them we can manage groups of objects with a much sophisticated way with many methods already given by the specific collection.
For example:
ArrayList - It's like a dynamic array i.e. we don't need to declare its size, it grows as we add elements to it and it shrinks as we remove elements from it, during the runtime of the program.
LinkedList - It can be used to depict a Queue(FIFO) or even a Stack(LIFO).
HashSet - It stores its element by a process called hashing. The order of elements in HashSet is not guaranteed.
TreeSet - TreeSet is the best candidate when one needs to store a large number of sorted elements and their fast access.
ArrayDeque - It can also be used to implement a first-in, first-out(FIFO) queue or a last-in, first-out(LIFO) queue.
HashMap - HashMap stores the data in the form of key-value pairs, where key and value are objects.
Treemap - TreeMap stores key-value pairs in a sorted ascending order and retrieval speed of an element out of a TreeMap is quite fast.
To learn more about Java collections, check out this article.
What is the fastest list implementation (in java) in a scenario where the list will be created one element at a time then at a later point be read one element at a time? The reads will be done with an iterator and then the list will then be destroyed.
I know that the Big O notation for get is O(1) and add is O(1) for an ArrayList, while LinkedList is O(n) for get and O(1) for add. Does the iterator behave with the same Big O notation?
It depends largely on whether you know the maximum size of each list up front.
If you do, use ArrayList; it will certainly be faster.
Otherwise, you'll probably have to profile. While access to the ArrayList is O(1), creating it is not as simple, because of dynamic resizing.
Another point to consider is that the space-time trade-off is not clear cut. Each Java object has quite a bit of overhead. While an ArrayList may waste some space on surplus slots, each slot is only 4 bytes (or 8 on a 64-bit JVM). Each element of a LinkedList is probably about 50 bytes (perhaps 100 in a 64-bit JVM). So you have to have quite a few wasted slots in an ArrayList before a LinkedList actually wins its presumed space advantage. Locality of reference is also a factor, and ArrayList is preferable there too.
In practice, I almost always use ArrayList.
First Thoughts:
Refactor your code to not need the list.
Simplify the data down to a scalar data type, then use: int[]
Or even just use an array of whatever object you have: Object[] - John Gardner
Initialize the list to the full size: new ArrayList(123);
Of course, as everyone else is mentioning, do performance testing, prove your new solution is an improvement.
Iterating through a linked list is O(1) per element.
The Big O runtime for each option is the same. Probably the ArrayList will be faster because of better memory locality, but you'd have to measure it to know for sure. Pick whatever makes the code clearest.
Note that iterating through an instance of LinkedList can be O(n^2) if done naively. Specifically:
List<Object> list = new LinkedList<Object>();
for (int i = 0; i < list.size(); i++) {
list.get(i);
}
This is absolutely horrible in terms of efficiency due to the fact that the list must be traversed up to i twice for each iteration. If you do use LinkedList, be sure to use either an Iterator or Java 5's enhanced for-loop:
for (Object o : list) {
// ...
}
The above code is O(n), since the list is traversed statefully in-place.
To avoid all of the above hassle, just use ArrayList. It's not always the best choice (particularly for space efficiency), but it's usually a safe bet.
There is a new List implementation called GlueList which is faster than all classic List implementations.
Disclaimer: I am the author of this library
You almost certainly want an ArrayList. Both adding and reading are "amortized constant time" (i.e. O(1)) as specified in the documentation (note that this is true even if the list has to increase it's size - it's designed like that see http://java.sun.com/j2se/1.5.0/docs/api/java/util/ArrayList.html ). If you know roughly the number of objects you will be storing then even the ArrayList size increase is eliminated.
Adding to the end of a linked list is O(1), but the constant multiplier is larger than ArrayList (since you are usually creating a node object every time). Reading is virtually identical to the ArrayList if you are using an iterator.
It's a good rule to always use the simplest structure you can, unless there is a good reason not to. Here there is no such reason.
The exact quote from the documentation for ArrayList is: "The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking). The constant factor is low compared to that for the LinkedList implementation."
I suggest benchmarking it. It's one thing reading the API, but until you try it for yourself, it'd academic.
Should be fair easy to test, just make sure you do meaningful operations, or hotspot will out-smart you and optimise it all to a NO-OP :)
I have actually begun to think that any use of data structures with non-deterministic behavior, such as ArrayList or HashMap, should be avoided, so I would say only use ArrayList if you can bound its size; any unbounded list use LinkedList. That is because I mainly code systems with near real time requirements though.
The main problem is that any memory allocation (which could happen randomly with any add operation) could also cause a garbage collection, and any garbage collection can cause you to miss a target. The larger the allocation, the more likely this is to occur, and this is also compounded if you are using CMS collector. CMS is non-compacting, so finding space for a new linked list node is generally going to be easier than finding space for a new 10,000 element array.
The more rigorous your approach to coding, the closer you can come to real time with a stock JVM. But choosing only data structures with deterministic behavior is one of the first steps you would have to take.