In Java, when would it be preferential to use a List rather than an Array?
I see the question as being the opposite-
When should you use an Array over a List?
Only you have a specific reason to do so (eg: Project Constraints, Memory Concerns (not really a good reason), etc.)
Lists are much easier to use (imo), and have much more functionality.
Note: You should also consider whether or not something like a Set, or another datastructure is a better fit than a List for what you are trying to do.
Each datastructure, and implmentation, has different pros/cons. Pick the ones that excel at the things that you need to do.
If you need get() to be O(1) for any item? Likely use an ArrayList, Need O(1) insert()? Possibly a Linked List. Need O(1) contains()? Possibly a Hashset.
TLDR: Each data structure is good at some things, and bad at others. Look at your objectives and choose the data structure that best fits the given problem.
Edit:
One thing not noted is that you're
better off declaring the variable as
its interface (i.e. List or Queue)
rather than its implementing class.
This way, you can change the
implementation at some later date
without changing anything else in the
code.
As an example:
List<String> myList = new ArrayList<String>();
vs
List<String> myList = new LinkedList<String>();
Note that myList is a List in both examples.
--R. Bemrose
Rules of thumb:
Use a List for reference types.
Use arrays for primitives.
If you have to deal with an API that is using arrays, it might be useful to use arrays. OTOH, it may be useful to enforce defensive copying with the type system by using Lists.
If you are doing a lot of List type operations on the sequence and it is not in a performance/memory critical section, then use List.
Low-level optimisations may use arrays. Expect nastiness with low-level optimisations.
Most people have answered it already.
There are almost no good reason to use an array instead of List. The main exception being the primitive array (like int[]). You cannot create a primitive list (must have List<Integer>).
The most important difference is that when using List you can decide what implementation will be used. The most obvious is to chose LinkedList or ArrayList.
I would like to point out in this answer that choosing the implementation gives you very fine grained control over the data that is simply not available to array:
You can prevent client from modifying your list by wrapping your list in a Collection.unmodifiableList
You can synchronize a list for multithreading using Collection.synchronizedList
You can create a fixed length queue with implementation of LinkedBlockingQueue
... etc
In any case, even if you don't want (now) any extra feature of the list. Just use an ArrayList and size it with the size of the array you would have created. It will use an Array in the back-end and the performance difference with a real array will be negligible. (except for primitive arrays)
Pretty much always prefer a list. Lists have much more functionality, particularly iterator support. You can convert a list to an array at any time with the toArray() method.
Always prefer lists.
Arrays when
Varargs for a method ( I guess you are forced to use Arrays here ).
When you want your collections to be covariant ( arrays of reference types are covariant ).
Performance critical code.
If you know how many things you'll be holding, you'll want an array. My screen is 1024x768, and a buffer of pixels for that isn't going to change in size ever during runtime.
If you know you'll need to access specific indexes (go get item #763!), use an array or array-backed list.
If you need to add or remove items from the group regularly, use a linked list.
In general, dealing with hardware, arrays, dealing with users, lists.
It depends on what kind of List.
It's better to use a LinkedList if you know you'll be inserting many elements in positions other than the end. LinkedList is not suitable for random access (getting the i'th element).
It's better to use an ArrayList if you don't know, in advance, how many elements there are going to be. The ArrayList correctly amortizes the cost of growing the backing array as you add more elements to it, and is suitable for random access once the elements are in place. An ArrayList can be efficiently sorted.
If you want the array of items to expand (i.e. if you don't know what the size of the list will be beforehand), a List will be beneficial. However, if you want performance, you would generally use an array.
In many cases the type of collection used is an implementation detail which shouldn't be exposed to the outside world. The more generic your returntype is the more flexibility you have changing the implementation afterwards.
Arrays (primitive type, ie. new int[10]) are not generic, you won't be able to change you implementation without an internal conversion or altering the client code. You might want to consider Iterable as a returntype.
Related
Java 8 provides java.util.Arrays.parallelSort, which sorts arrays in parallel using the fork-join framework. But there's no corresponding Collections.parallelSort for sorting lists.
I can use toArray, sort that array, and store the result back in my list, but that will temporarily increase memory usage, which if I'm using parallel sorting is already high because parallel sorting only pays off for huge lists. Instead of twice the memory (the list plus parallelSort's working memory), I'm using thrice (the list, the temporary array and parallelSort's working memory). (Arrays.parallelSort documentation says "The algorithm requires a working space no greater than the size of the original array".)
Memory usage aside, Collections.parallelSort would also be more convenient for what seems like a reasonably common operation. (I tend not to use arrays directly, so I'd certainly use it more often than Arrays.parallelSort.)
The library can test for RandomAccess to avoid trying to e.g. quicksort a linked list, so that can't a reason for a deliberate omission.
How can I sort a List in parallel without creating a temporary array?
There doesn't appear to be any straightforward way to sort a List in parallel in Java 8. I don't think this is fundamentally difficult; it looks more like an oversight to me.
The difficulty with a hypothetical Collections.parallelSort(list, cmp) is that the Collections implementation knows nothing about the list's implementation or its internal organization. This can be seen by examining the Java 7 implementation of Collections.sort(list, cmp). As you observed, it has to copy the list elements out to an array, sort them, and then copy them back into the list.
This is the big advantage of the List.sort(cmp) extension method over Collections.sort(list, cmp). It might seem that this is merely a small syntactic advantage being able to write myList.sort(cmp) instead of Collections.sort(myList, cmp). The difference is that myList.sort(cmp), being an interface extension method, can be overridden by the specific List implementation. For example, ArrayList.sort(cmp) sorts the list in-place using Arrays.sort() whereas the default implementation implements the old copyout-sort-copyback technique.
It should be possible to add a parallelSort extension method to the List interface that has similar semantics to List.sort but does the sorting in parallel. This would allow ArrayList to do a straightforward in-place sort using Arrays.parallelSort. (It's not entirely clear to me what the default implementation should do. It might still be worth it to do copyout-parallelSort-copyback.) Since this would be an API change, it can't happen until the next major release of Java SE.
As for a Java 8 solution, there are a couple workarounds, none very pretty (as is typical of workarounds). You could create your own array-based List implementation and override sort() to sort in parallel. Or you could subclass ArrayList, override sort(), grab the elementData array via reflection and call parallelSort() on it. Of course you could just write your own List implementation and provide a parallelSort() method, but the advantage of overriding List.sort() is that this works on the plain List interface and you don't have to modify all the code in your code base to use a different List subclass.
I think you are doomed to use a custom List implementation augmented with your own parallelSort or else change all your other code to store the big data in Array types.
This is the inherent problem with layers of abstract data types. They're meant to isolate the programmer from details of implementation. But when the details of implementation matter - as in the case of underlying storage model for sort - the otherwise splendid isolation leaves the programmer helpless.
The standard List sort documents provide an example. After the explanation that mergesort is used, they say
The default implementation obtains an array containing all elements in this list, sorts the array, and iterates over this list resetting each element from the corresponding position in the array. (This avoids the n2 log(n) performance that would result from attempting to sort a linked list in place.)
In other words, "since we don't know the underlying storage model for a List and couldn't touch it if we did, we make a copy organized in a known way." The parenthesized expression is based on the fact that the List "i'th element accessor" on a linked list is Omega(n), so the normal array mergesort implemented with it would be a disaster. In fact it's easy to implement mergesort efficiently on linked lists. The List implementer is just prevented from doing it.
A parallel sort on List has the same problem. The standard sequential sort fixes it with custom sorts in the concrete List implementations. The Java folks just haven't chosen to go there yet. Maybe in Java 9.
Use the following:
yourCollection.parallelStream().sorted().collect(Collectors.toList());
This will be parallel when sorting, because of parallelStream(). I believe this is what you mean by parallel sort?
Just speculating here, but I see several good reasons for generic sort algorithms preferring to work on arrays instead of List instances:
Element access is performed via method calls. Despite all the optimizations JIT can apply, even for a list that implements RandomAccess, this probably means a lot of overhead compared to plain array accesses which can be optimized very well.
Many algorithms require copying some fragments of the array to temporary structures. There are efficient methods for copying arrays or their fragments. An arbitrary List instance on the other hand, can't be easily copied. New lists would have to be allocated which poses two problems. First, this means allocating some new objects which is likely more costly than allocating arrays. Second, the algorithm would have to choose what implementation of List should be allocated for this temporary structure. There are two obvious solutions, both bad: either just choose some hard-coded implementation, e.g. ArrayList, but then it could just allocate simple arrays as well (and if we're generating arrays then it's much easier if the soiurce is also an array). Or, let the user provide some list factory object, which makes the code much more complicated.
Related to the previous issue: there is no obvious way of copying a list into another due to how the API is designed. The best the List interface offers is addAll() method, but this is probably not efficient for most cases (think of pre-allocating the new list to its target size vs adding elements one by one which many implementations do).
Most lists that need to be sorted will be small enough for another copy to not be an issue.
So probably the designers thought of CPU efficiency and code simplicity most of all, and this is easily achieved when the API accepts arrays. Some languages, e.g. Scala, have sort methods that work directly on lists, but this comes at a cost and probably is less efficient than sorting arrays in many cases (or sometimes there will probably just be a conversion to and from array performed behind the scenes).
By combining the existing answers I came up with this code.
This works if you are not interested in creating a custom List class and if you don't bother to create a temporary array (Collections.sort is doing it anyway).
This uses the initial list and does not create a new one as in the parallelStream solution.
// Convert List to Array so we can use Arrays.parallelSort rather than Collections.sort.
// Note that Collections.sort begins with this very same conversion, so we're not adding overhead
// in comparaison with Collections.sort.
Foo[] fooArr = fooLst.toArray(new Foo[0]);
// Multithread the TimSort. Automatically fallback to mono-thread when size is less than 8192.
Arrays.parallelSort(fooArr, Comparator.comparingStuff(Foo::yourmethod));
// Refill the List using the sorted Array, the same way Collections.sort does it.
ListIterator<Foo> i = fooLst.listIterator();
for (Foo e : fooArr) {
i.next();
i.set(e);
}
I am writing a program that will be heavily reliant on ... something ... that stores data like an array where I am able to access any point of the data at any given time as I can in an array.
I know that the java library has an Array class that I could use or I could use a raw array[].
I expect that using the Array type is a bit easier to code, but I expect that it is slightly less efficient as well.
My question is, which is better to use between these two, and is there a better way to accomplish the same result?
Actually Array would be of no help -- it's not what you think it is. The class java.util.ArrayList, on the other hand, is. In general, if you can program with collection classes like ArrayList, do so -- you'll more easily arrive at correct, flexible software that's easier to read, too. And that "if" applies almost all the time; raw arrays are something you use as a last resort or, more often, when a method you want to call requires one as an argument.
The Array class is used for Java reflection and is very, very, rarely used.
If you want to store data in an array, use plain old arrays, indicated with [], or as Gabe's comment on the question suggests, java.util.ArrayList. ArrayList is, as your comment suggests easier to code (when it comes to adding and removing elements!!) but yes, is slightly less efficient. For variable-size collections, ArrayList is all but required.
My question is, which is better to use between these two, and is there a better way to accomplish the same result?
It depends on what you are trying to achieve:
If the number of elements in the array is known ahead of time, then an array type is a good fit. If not, a List type is (at least) more convenient to use.
The List interface offers a number of methods such as contains, insert, remove and so on that can save you coding ... if you need to do that sort of thing.
If properly used, an array type will use less space. The difference is particularly significant for arrays of primitive types where using a List means that the elements need to be represented using wrapper types (e.g. byte becomes Byte).
The Array class is not useful in this context, and neither is the Arrays class. The choice is between ArrayList (or some other List implementation class) and primitive arrays.
In terms of ease of use, the Array class is a lot easier to code.
The array[] is quite a problem in terms of the case that you need to know
the size of the list of objects beforehand.
Instead, you could use a HashMap. It is very efficient in search as well as sorting as
the entire process is carried out in terms of key values.
You could declare a HashMap as:
HashMap<String, Object> map = new HashMap<String, Object>();
For the Object you can use your class, and for key use the value which needs to be unique.
When is it better to use a vector than an array and vice versa in java? and why?
Vector: never, unless an API requires it, because it's a class, not an interface.
List: this should be your default array-like collection. It's an interface so anything can be a List if it needs to. (and there are lots of List implementations out there e.g. ArrayList, LinkedList, CopyOnWriteArrayList, ImmutableList, for various feature sets)
Vector is threadsafe, but so is the Collections.synchronizedList() wrapper.
array: rarely, if required by an API. The one other major advantage of arrays is when you need a fixed-length array of primitives, the memory space required is fairly compact, as compared to a List<Integer> where the integers need to be boxed into Integer objects.
A Vector (or List) when you don't know before hand how many elements are going to be inserted.
An array when you absolutely know what's the maximum number of elements on that vector's whole life.
Since there are no high performance penalties when using List or Vector (any collection for that matter), I would always chose to use them. Their flexibility is too important to not be considered.
Nowadays I only use arrays when I absolutely need to. Example: when using an API that requires them.
First off ArrayList is a faster implementation than Vector (but not thread safe though).
Arrays are handy when you know the length beforehand and will not change much (or at all).
When declaring a method, use List.
Do not use Vector, it's an early part of the JDK and was retrofitted to work with Collections.
If there's a very performance-sensitive algorithm, a private array member can be helpful. If there's a need to return its contents or pass them to a method, it's generally best to construct an object around it, perhaps as simple as Arrays.asList(thePrivateArray). For a thread-safe list: Collections.synchronizedList(Arrays.asList(thePrivateArray)). In order to prevent modification of the array contents, I typically use Collections.unmodifiableList(Arrays.asList(thePrivateArray)).
Is it advisable to use Java Collections List in the cases when you know the size of the list before hand and you can also use array there? Are there any performance drawbacks?
Can a list be initialised with elements in a single statement like an array (list of all elements separated by commas) ?
Is it advisable to use Java Collections List in the cases when you know the size of the list before hand and you can also use array there ?
In some (probably most) circumstances yes, it is definitely advisable to use collections anyway, in some circumstances it is not advisable.
On the pro side:
If you use an List instead of an array, your code can use methods like contains, insert, remove and so on.
A lot of library classes expect collection-typed arguments.
You don't need to worry that the next version of the code may require a more dynamically sized array ... which would make an initial array-based approach a liability.
On the con side:
Collections are a bit slower, and more so if the base type of your array is a primitive type.
Collections do take more memory, especially if the base type of your array is a primitive type.
But performance is rarely a critical issue, and in many cases the performance difference is not relevant to the big picture.
And in practice, there is often a cost in performance and/or code complexity involved in working out what the array's size should be. (Consider the hypothetical case where you used a char[] to hold the concatenation of a series. You can work out how big the array needs to be; e.g. by adding up the component string sizes. But it is messy!)
Collections/lists are more flexible and provide more utility methods. For most situations, any performance overhead is negligible.
And for this single statement initialization, use:
Arrays.asList(yourArray);
From the docs:
Returns a fixed-size list backed by the specified array. (Changes to the returned list "write through" to the array.) This method acts as bridge between array-based and collection-based APIs, in combination with Collection.toArray. The returned list is serializable and implements RandomAccess.
My guess is that this is the most performance-wise way to convert to a list, but I may be wrong.
1) a Collection is the most basic type and only implies there is a collection of objects. If there is no order or duplication use java.util.Set, if there is possible duplication and ordering use java.util.List, is there is ordering but no duplication use java.util.SortedSet
2) Curly brackets to instantiate an Array, Arrays.asList() plus generics for the type inference
List<String> myStrings = Arrays.asList(new String[]{"one", "two", "three"});
There is also a trick using anonymous types but personally I'm not a big fan:
List<String> myStrings = new ArrayList<String>(){
// this is the inside of an anonymouse class
{
// this is the inside of an instance block in the anonymous class
this.add("one");
this.add("two");
this.add("three");
}};
Yes, it is advisable.
Some of the various list constructors (like ArrayList) even take arguments so you can "pre-allocate" sufficient backing storage, alleviating the need for the list to "grow" to the proper size as you add elements.
There are different things to consider: Is the type of the array known? Who accesses the array?
There are several issues with arrays, e.g.:
you can not create generic arrays
arrays are covariant: if A extends B -> A[] extends B[], which can lead to ArrayStoreExceptions
you cannot make the fields of an array immutable
...
Also see, item 25 "Prefer lists to arrays" of the Effective Java book.
That said, sometimes arrays are convenient, e.g. the new Object... parameter syntax.
How can a list be initialised with elements in a single statement like an array = {list of all elements separated by commas} ?
Arrays.asList(): http://download.oracle.com/javase/6/docs/api/java/util/Arrays.html#asList%28T...%29
Is it advisable to use Java Collections List in the cases when you know the size of the list before hand and you can also use array there ? Performance drawbacks ???
If an array is enough, then use an array. Just to keep things simple. You may even get a slightly better performance out of it. Keep in mind that if you...
ever need to pass the resulting array to a method that takes a Collection, or
if you ever need to work with List-methods such as .contains, .lastIndexOf, or what not, or
if you need to use Collections methods, such as reverse...
then may just as well go for the Collection/List classes from the beginning.
How can a list be initialised with elements in a single statement like an array = {list of all elements separated by commas} ?
You can do
List<String> list = Arrays.asList("foo", "bar");
or
List<String> arrayList = new ArrayList<String>(Arrays.asList("foo", "bar"));
or
List<String> list = new ArrayList<String>() {{ add("foo"); add("bar"); }};
Is it advisable to use Java
Collections List in the cases when you
know the size of the list before hand
and you can also use array there ?
Performance drawbacks ?
It can be perfectly acceptable to use a List instead of an array, even if you know the size before hand.
How can a list be initialised with
elements in a single statement like an
array = {list of all elements
separated by commas} ?
See Arrays.asList().
What is the need of Collection framework in Java since all the data operations(sorting/adding/deleting) are possible with Arrays and moreover array is suitable for memory consumption and performance is also better compared with Collections.
Can anyone point me a real time data oriented example which shows the difference in both(array/Collections) of these implementations.
Arrays are not resizable.
Java Collections Framework provides lots of different useful data types, such as linked lists (allows insertion anywhere in constant time), resizeable array lists (like Vector but cooler), red-black trees, hash-based maps (like Hashtable but cooler).
Java Collections Framework provides abstractions, so you can refer to a list as a List, whether backed by an array list or a linked list; and you can refer to a map/dictionary as a Map, whether backed by a red-black tree or a hashtable.
In other words, Java Collections Framework allows you to use the right data structure, because one size does not fit all.
Several reasons:
Java's collection classes provides a higher level interface than arrays.
Arrays have a fixed size. Collections (see ArrayList) have a flexible size.
Efficiently implementing a complicated data structures (e.g., hash tables) on top of raw arrays is a demanding task. The standard HashMap gives you that for free.
There are different implementation you can choose from for the same set of services: ArrayList vs. LinkedList, HashMap vs. TreeMap, synchronized, etc.
Finally, arrays allow covariance: setting an element of an array is not guaranteed to succeed due to typing errors that are detectable only at run time. Generics prevent this problem in arrays.
Take a look at this fragment that illustrates the covariance problem:
String[] strings = new String[10];
Object[] objects = strings;
objects[0] = new Date(); // <- ArrayStoreException: java.util.Date
Collection classes like Set, List, and Map implementations are closer to the "problem space." They allow developers to complete work more quickly and turn in more readable/maintainable code.
For each class in the Collections API there's a different answer to your question. Here are a few examples.
LinkedList: If you remove an element from the middle of an array, you pay the cost of moving all of the elements to the right of the removed element. Not so with a linked list.
Set: If you try to implement a set with an array, adding an element or testing for an element's presence is O(N). With a HashSet, it's O(1).
Map: To implement a map using an array would give the same performance characteristics as your putative array implementation of a set.
It depends upon your application's needs. There are so many types of collections, including:
HashSet
ArrayList
HashMap
TreeSet
TreeMap
LinkedList
So for example, if you need to store key/value pairs, you will have to write a lot of custom code if it will be based off an array - whereas the Hash* collections should just work out of the box. As always, pick the right tool for the job.
Well the basic premise is "wrong" since Java included the Dictionary class since before interfaces existed in the language...
collections offer Lists which are somewhat similar to arrays, but they offer many more things that are not. I'll assume you were just talking about List (and even Set) and leave Map out of it.
Yes, it is possible to get the same functionality as List and Set with an array, however there is a lot of work involved. The whole point of a library is that users do not have to "roll their own" implementations of common things.
Once you have a single implementation that everyone uses it is easier to justify spending resources optimizing it as well. That means when the standard collections are sped up or have their memory footprint reduced that all applications using them get the improvements for free.
A single interface for each thing also simplifies every developers learning curve - there are not umpteen different ways of doing the same thing.
If you wanted to have an array that grows over time you would probably not put the growth code all over your classes, but would instead write a single utility method to do that. Same for deletion and insertion etc...
Also, arrays are not well suited to insertion/deletion, especially when you expect that the .length member is supposed to reflect the actual number of contents, so you would spend a huge amount of time growing and shrinking the array. Arrays are also not well suited for Sets as you would have to iterate over the entire array each time you wanted to do an insertion to check for duplicates. That would kill any perceived efficiency.
Arrays are not efficient always. What if you need something like LinkedList? Looks like you need to learn some data structure : http://en.wikipedia.org/wiki/List_of_data_structures
Java Collections came up with different functionality,usability and convenience.
When in an application we want to work on group of Objects, Only ARRAY can not help us,Or rather they might leads to do things with some cumbersome operations.
One important difference, is one of usability and convenience, especially given that Collections automatically expand in size when needed:
Collections came up with methods to simplify our work.
Each one has a unique feature:
List- Essentially a variable-size array;
You can usually add/remove items at any arbitrary position;
The order of the items is well defined (i.e. you can say what position a given item goes in in the list).
Used- Most cases where you just need to store or iterate through a "bunch of things" and later iterate through them.
Set- Things can be "there or not"— when you add items to a set, there's no notion of how many times the item was added, and usually no notion of ordering.
Used- Remembering "which items you've already processed", e.g. when doing a web crawl;
Making other yes-no decisions about an item, e.g. "is the item a word of English", "is the item in the database?" , "is the item in this category?" etc.
Here you find use of each collection as per scenario:
Collection is the framework in Java and you know that framework is very easy to use rather than implementing and then use it and your concern is that why we don't use the array there are drawbacks of array like it is static you have to define the size of row at least in beginning, so if your array is large then it would result primarily in wastage of large memory.
So you can prefer ArrayList over it which is inside the collection hierarchy.
Complexity is other issue like you want to insert in array then you have to trace it upto define index so over it you can use LinkedList all functions are implemented only you need to use and became your code less complex and you can read there are various advantages of collection hierarchy.
Collection framework are much higher level compared to Arrays and provides important interfaces and classes that by using them we can manage groups of objects with a much sophisticated way with many methods already given by the specific collection.
For example:
ArrayList - It's like a dynamic array i.e. we don't need to declare its size, it grows as we add elements to it and it shrinks as we remove elements from it, during the runtime of the program.
LinkedList - It can be used to depict a Queue(FIFO) or even a Stack(LIFO).
HashSet - It stores its element by a process called hashing. The order of elements in HashSet is not guaranteed.
TreeSet - TreeSet is the best candidate when one needs to store a large number of sorted elements and their fast access.
ArrayDeque - It can also be used to implement a first-in, first-out(FIFO) queue or a last-in, first-out(LIFO) queue.
HashMap - HashMap stores the data in the form of key-value pairs, where key and value are objects.
Treemap - TreeMap stores key-value pairs in a sorted ascending order and retrieval speed of an element out of a TreeMap is quite fast.
To learn more about Java collections, check out this article.