I need a class that implements Iterable, and does not need to be safe for concurrent usage. Of the various options, such as LinkedList, HashSet, ArrayList etc, which is the lightest-weight?
To clarify the use-case, I need to be able to add a number of objects to the Iterable (typically 3 or 4), and then something else needs to iterate over it.
ArrayList. From the Javadoc
The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking). The constant factor is low compared to that for the LinkedList implementation.
That entirely depends on what you mean by "lightest weight". What operations do you need to do, and how often? Do you know the final size beforehand? Are you trying to save execution time or memory?
I would agree that zkarthik that ArrayList is very often a good choice... but it will behave very badly if you want to create a large collection and then repeatedly remove the first element, for example. There's a good reason for there being so many different collections: they have different performance characteristics for different situations.
They all have very different features and behavior, so you should base your choice on how you will use them. For example, for random access and high locality, use an ArrayList; if you need fast unordered insertion and querying, use a HashSet.
If by 'lightweight', you mean 'best performance' then the question is almost impossible to answer without understanding how the collection will be used. All you've told us so for is that it doesn't need to support concurrent usage, but in order to have any hope of answering the question we'd need to know things like
How many objects will be stored in the collection (on average)
What is the relative frequency of read and write access
Is random-access required
Is ordered access required
A number of people have suggested ArrayList may be best. However, I seem to recall reading (possibly in Effective Java 2nd edition), that for certain patterns of usage, Queue performs better than List, because it does not incurr the penalty of random access. In other words, you can add/remove items from a List in any order, but you can only add/remove items in a queue in a specific order (i.e. add to the tail, and remove from the head).
Related
I must work with a Collection, and I am not sure about using a List or a Set. This collection must be sorted, but not by the order of insertion but for another one, so each time a new item is added, a Comparator should be executed in order to reorder the Collection. So, for this reason, an ArrayList could be the best option.
Removing objects from that Collection must be possible too, furthermore, I would really appreciate using removeIf method, so a Set would be the best option here.
Getting and iterating over the Collection will be the most repeated scenario, so it must have a good performance in this scenario.
Seeing that, I think that a Set would be a good decision, however, I was thinking about converting the Set into a List when adding items, then, once the list has been resorted, converting it back to a Set. Is it bad performing? What do you think?
Thanks in advance
Unless you have bulk inserts during which you would need no sorting, TreeSet is fine. Simply measure both solutions.
With TreeSet inserting already ordered items, like rereading a set from disk, performs bad in that even a balanced tree, will have a bit too large depth. That however can be remedied.
For better performance you might go for a B-tree (needs 3rd party code) instead of the binary TreeSet. Measure that too, as typically a facet such as deletion with rebalancing might be done suboptimally.
This depends a lot on how you fill and use your collection and performance of which operation is the most important.
Do you fill the collection with items at once? Or add new elements from time to time? Does the performance of adding elements matter? Or only the iteration performance is important?
If performance is critical, it might make sense to implement a few solutions and compare their performance using a benchmark.
I personally don't believe that iteration performance of a TreeSet is that much worse that ArrayLists or LinkedLists or LinkedHashMaps. Especially compared to linked data structures. Iteration on a tree should not be that different in the performance. But I have no data, so this is just a belief here.
Below are two implementation ideas.
First, if you load a lot of data at once and then add new items rather seldom, load the data into an ArrayList and sort it using Collections.sort. If you need to add another item do a binary search (Collections.binarySearch) and insert the element at the corresponding position. Wrap it all in a custom List implementation and you're good to go.
Next, if you fill the collection with the data "in the beginning" and then the collection is hardly modified, you may simply cache the iteration order in an ArrayList. Every time the collection is modified, reset this list and. When iteration is requested and the list is not null, just use it, otherwise first fill it in the order of the sorted set.
I hope that this question is specific enough to be deemed fit for StackOverflow. I checked the FAQ and I think this qualifies, since it is specific and related to programming.
I'm implementing a complex data mining algorithm (FP-growth) in Java. Some of the initial phases of the algorithm require me to scan a large database and keep a running count of each item type found. This seems perfectly suited to a Hashbag interface. I found one in Apache Commons which seems to work for me.
So now, my HashBag is filled with [itemType, count] entries (pairs). Later on in the algorithm, I'm required to do a lot of list-like operations on these pairs. In some cases, I must sort the collection by itemType. In others, I must sort by count. This seems perfectly suited to a List interface.
I'm left with the conclusion that I must convert my Hasbag to a List. Yet it feels dirty somehow, like a waste of space and time. Is there a smarter way to do this, or is it a common situation to have a programming problem where you must treat your collection differently at different times, and conversions are a necessary evil?
One alternative is to make my own interface which is truly a list, but allows "bag-style" adds. I'd have to keep the list sorted and perform binary searches with a custom comparator every time I wanted to add something. Building that collection would probably take longer than building a Hashbag, but I'd save on the conversion step at the end. Any thoughts as to which is preferable?
Thanks!
If you used Guava's Multiset instead of Apache's Bag -- roughly analogous, but in a different style -- you can do most of this without converting. Multiset.entrySet() returns a Set<Entry<E>>, with Entry<E> effectively representing a pair of an element and a count -- that sounds like it's probably the best way to address your need to operate on the element-count pairs, maybe? You can iterate over that like you'd iterate over a Map.entrySet().
You can use Multisets.copyHighestCountFirst(Multiset) to get a multiset reordered in highest-frequency-first order, and use TreeMultiset to order by the elements directly.
(Disclosure: I contribute to Guava.)
I assume you're using the Apache Commons Collections HashBag class. Have you considered using TreeBag instead? It implements the same Bag interface but efficiently keeps the data sorted according to a comparator you provide.
That said, when you need to change sort order, there isn't usually any better answer than to copy the collection to a new one with a different comparator.
Yet it feels dirty somehow, like a waste of space and time. Is there a smarter way to do this, or is it a common situation to have a programming problem where you must treat your collection differently at different times, and conversions are a necessary evil?
Sometimes it is necessary to convert between collection types. If it is necessary "dirty" or "inelegant" or "dumb" are not really relevant.
It can also be a mistake to over-think these things up front. The actual computational trade-offs are often difficult to grasp. For instance, if you changed the HashBag to a TreeBag, insertion goes from O(1) to O(logN) but you then avoid the overheads of sorting and copying. "Big Oh" analysis / thinking is not going to give you a clear answer. Indeed, the real performance is going to depend on the scaling factors, the values of N, the ratio of hits and misses in the bag and so on.
I would advise to try implementing things the obvious way, and see if it performs well enough ... and if not, profile it to see if the data structures are the main bottleneck. Then based on the profiling, and other measurements of the input datasets, figure out the best way to improve performance from your baseline implementation.
Answering my own question!
I did some experimenting with the different types of Multiset provided by the Guava libary mentioned above by Louis Wasserman. In my particular test case, I'm parsing a 1GB XML file (database of books and authors) and creating a very large Multiset (keeping a count of how many times each author shows up in the DB). Once I reach the end of the parsing, I need to get a new Multiset which only contains the authors who showed up more than x times, where x is some threshold value. I also want my final set to be sorted by author name.
Here are two of the different ways I tried it (among others):
1) collect the original counts in a TreeMultiset and then remove any which don't meet the threshold
2) collect the original counts in a HashMultiset, and then create a new TreeMultiset where I add each item from the hash set with a count the meets the threshold
The second way proved to be significantly faster (roughly 25%), despite the conversion and extra memory usage. Obviously a big part of this is that it is pretty inefficient to delete from binary trees.
So the clear conclusion here is that in this situation, conversion is a good move (unless you have memory constraints that won't allow it).
Thanks again for turning me onto the Guava library, Louis!
I'm a student and fairly new to Java. I was looking over the different speeds achieved by the two collections in Java, Linked List, and ArrayList. I know that an ArrayList is much much faster at looking up and placing in values into its indexes. My question is:
how can one make a linked list faster, if at all possible?
Thanks for any help.
zmahir
When talking about speed, perhaps you mean complexity. Insertion and retrieval operations for ArrayList (and arrays) are O(1), while for LinkedList they are O(n). And this cannot be changed - it is 'by definition'.
O(n) means that in order to insert an object at a given position, or retrieve it, you must traverse, in the worst case, all (n) the items in the list. Hence n operations. For ArrayList this is only one operation.
You probably can't. You don't know the size (well, ok you can), nor the location of each element. To find element 100 in a linked list, you need to start with item 1, find it's link to item 2, etc. until you find 100. This makes inserting into this list a tedious job.
There are many alternatives depending on your exact goals. You can use b-trees or similar methods to split the large linked list into smaller ones. Or use hashlists if you want to quickly find items. Or use simple arrays. But if you want a list that performs like an ArrayList, why not use an ArrayList?
You can split off regions which are linked to the main linked list, so this gives you entry points directly inside the list so you don't have to walk up to them. See the subList method here: http://download.oracle.com/javase/1.4.2/docs/api/java/util/AbstractList.html. This is useful if you have a number of 'sentences' made out of words, say. You can use a separate linked list to iterate over the sentences, which are sublists of the main linked list.
You can also use a ListIterator when adding, removing, or accessing elements. This helps greatly with increasing the speed of sequential access. See the listIterator method for this, and the class: http://download.oracle.com/javase/1.4.2/docs/api/java/util/ListIterator.html.
Speed of a linked list could be improved by using skip lists: http://igoro.com/archive/skip-lists-are-fascinating/
a linked list uses pointers to walk through the items, so for example if you asked for the 5th item, the runtime will start from the first item and walks through each pointer until it reaches the 5th item.
there is really not much you can do about it. a linked list may not be a good choice if you need fast acces to items. although there are some optimizations for it such as creating a circular linked list or a double linked list where you can walk back and forth the list but this really depends on the business logic and the application requirements.
my advise is to avoid linked lists if it does not match your needs and changing to a different data structure might be the best approach.
As a general rule, data structures are designed to do certain things well. LinkedLists are designed to be faster than ArrayLists at inserting elements and removing elements and about the same as ArrayLists at iterating across the list in order. When you change the way a LinkedList works, you make it no longer a true LinkedList, so there's not really any way to modify them to be faster at something and still be a LinkedList.
You'll need to examine the way you're using this particular collection and decide whether a LinkedList is really the best data structure for your purposes. If you share with us how you're using it, and why you need it to be faster, then we can advise you on which data structure you ought to consider using.
Lots of people smarter than you or I have looked at the implementation of the Java collection classes. If there were an optimization to be made, they would have found it and already made it.
Since the collection classes are pretty much as optimized as they can be, our primary task should be to choose the correct one.
When choosing your collection type, don't forget about things like HashSet. If order doesn't matter, and you don't need to put duplicates in the collection, then HashSet may be appropriate.
I'm a student and fairly new to Java. ... how can one make a linked list faster, if at all possible?
The standard Java collection type (indeed all data structures implemented in any language!) represent compromises on various "measures" such as:
The amount of memory needed to represent the data structure.
The time taken to perform various operations; e.g. for a "list" the operations of interest are insertion, removal, indexing, contains, iteration and so on.
How easy or hard it is to integrate / reuse the collection type; see below.
So for instance:
ArrayList offers lower memory overheads, fast indexing (O(1)), but slow contains, random insertion and removal (O(N)).
LinkedList has higher memory overheads, slow indexing and contains (O(N)), but faster removal (O(1)) under certain circumstances.
The various performance measures are typically determines by the maths of the various data structures. For example, if you have a chain of nodes, the only way to get the ith node is to step through them from the beginning. This involves following i pointers.
Sometimes you can modify the data structures to improve one aspect of the performance. But this typically comes at the cost of some other aspect of the performance. (For example, you could add a separate index to make indexing of a linked list faster. But the cost of maintaining the index on insertion / deletion would mean that you'd probably be better of using an ArrayList.)
In some cases the integration / reuse requirements have significant impact on performance.
For example, it is theoretically possible to optimize a linked list's space usage by adding a next field to the list element type, combining the element and node objects and saving 16 or so bytes per list entry. However, this would make the list type less general (the member/element class would need to implement a specific interface), and has the restriction that an element can belong to at most one list at any time. These restrictions are so limiting that this approach is rarely used in Java.
For a second example, consider the problem of inserting at a given position in a linked list. For the LinkedList class, this is normally an O(N) operation, because you have to step through the list to find the position. In theory, if an application could find and remember a position, it should be able to perform the insertion at that position in O(1). Unfortunately, neither the List APIs provides no way to "remember" a position.
While neither of these examples is a fundamental roadblock to a developer "doing his own thing", they illustrate that using general data structure APIs and general implementations of those APIs has performance implications, and therefore represents a trade-off between performance and ease-of-use.
I'm a bit surprised by the answers here. There are big difference between the theoretical performance of LinkedLists and ArrayLists compared to the actual performance of the Java implementations.
What makes the Java LinkedList slower than a theoretical LinkedList is that it does a lot more than just the operations. For example it checks for concurrent modifications and other safeties.
If you know your use case, you can write a your own simple implementation of a LinkedList and it will be much faster.
All,
I have been going through a lot of sites that post about the performance of various Collection classes for various actions i.e. adding an element, searching and deleting. But I also notice that all of them provide different environments in which the test was conducted i.e. O.S, memory, threads running etc.
My question is, if there is any site/material that provides the same performance information on best test environment basis? i.e. the configurations should not be an issue or catalyst for poor performance of any specific data structure.
[Updated]: Example, HashSet and LinkedHashSet both have a complexity of O (1) for inserting an element. However, Bruce Eckel' test claims that insertion is going to take more time for LinkedHashSet than for HashSet [http://www.artima.com/weblogs/viewpost.jsp?thread=122295]. So should I still go by the Big-Oh notation ?
Here are my recommendations:
First of all, don't optimize :) Not that I am telling you to design crap software, but just to focus on design and code quality more than premature optimization. Assuming you've done that, and now you really need to worry about which collection is best beyond purely conceptual reasons, let's move on to point 2
Really, don't optimize yet (roughly stolen from M. A. Jackson)
Fine. So your problem is that even though you have theoretical time complexity formulas for best cases, worst cases and average cases, you've noticed that people say different things and that practical settings are a very different thing from theory. So run your own benchmarks! You can only read so much, and while you do that your code doesn't write itself. Once you're done with the theory, write your own benchmark - for your real-life application, not some irrelevant mini-application for testing purposes - and see what actually happens to your software and why. Then pick the best algorithm. It's empirical, it could be regarded as a waste of time, but it's the only way that actually works flawlessly (until you reach the next point).
Now that you've done that, you have the fastest app ever. Until the next update of the JVM. Or of some underlying component of the operating system your particular performance bottleneck depends on. Guess what? Maybe your clients have different ones. Here comes the fun: you need to be sure that your benchmark is valid for others or in most cases (or have fun writing code for different cases). You need to collect data from users. LOTS. And then you need to do that over and over again to see what happens and if it still holds true. And then re-write your code accordingly over and over again (The - now terminated - Engineering Windows 7 blog is actually a nice example of how user data collection helps to make educated decisions to improve user experience.
Or you can... you know... NOT optimize. Platforms and compilers will change, but a good design should - on average - perform well enough.
Other things you can also do:
Have a look at the JVM's source code. It's very educative and you discover a herd of hidden things (I'm not saying that you have to use them...)
See that other thing on your TODO list that you need to work on? Yes, the one near the top but that you always skip because it's too hard or not fun enough. That one right there. Well get to it and leave the optimization thingy alone: it's the evil child of a Pandora's Box and a Moebius band. You'll never get out of it, and you'll deeply regret you tried to have your way with it.
That being said, I don't know why you need the performance boost so maybe you have a very valid reason.
And I am not saying that picking the right collection doesn't matter. Just that ones you know which one to pick for a particular problem, and that you've looked at alternatives, then you've done your job without having to feel guilty. The collections have usually a semantic meaning, and as long as you respect it you'll be fine.
In my opinion, all you need to know about a data structure is the Big-O of the operations on it, not subjective measures from different architectures. Different collections serve different purposes.
Maps are dictionaries
Sets assert uniqueness
Lists provide grouping and preserve iteration order
Trees provide cheap ordering and quick searches on dynamically changing contents that require constant ordering
Edited to include bwawok's statement on the use case of tree structures
Update
From the javadoc on LinkedHashSet
Hash table and linked list implementation of the Set interface, with predictable iteration order.
...
Performance is likely to be just slightly below that of HashSet, due to the added expense of maintaining the linked list, with one exception: Iteration over a LinkedHashSet requires time proportional to the size of the set, regardless of its capacity. Iteration over a HashSet is likely to be more expensive, requiring time proportional to its capacity.
Now we have moved from the very general case of choosing an appropriate data-structure interface to the more specific case of which implementation to use. However, we still ultimately arrive at the conclusion that specific implementations are well suited for specific applications based on the unique, subtle invariant offered by each implementation.
What do you need to know about them, and why? The reason that benchmarks show a given JDK and hardware setup is so that they could (in theory) be reproduced. What you should get from benchmarks is an idea of how things will work. For an ABSOLUTE number, you will need to run it vs your own code doing your own thing.
The most important thing to know is the Big O runtime of various collections. Knowing that getting an element out of an unsorted ArrayList is O(n), but getting it out of a HashMap is O(1) is HUGE.
If you are already using the correct collection for a given job, you are 90% of the way there. The times when you need to worry about how fast you can, say, get items out of a HashMap should be pretty darn rare.
Once you leave single threaded land and move into multi-threaded land, you will need to start worrying about things like ConcurrentHashMap vs Collections.synchronized hashmap. Until you are multi threaded, you can just not worry about this kind of stuff and focus on which collection for which use.
Update to HashSet vs LinkedHashSet
I haven't ever found a use case where I needed a Linked Hash Set (because if I care about order I tend to have a List, if I care about O(1) gets, I tend to use a HashSet. Realistically, most code will use ArrayList, HashMap, or HashSet. If you need anything else, you are in a "edge" case.
The different collection classes have different big-O performances, but all that tells you is how they scale as they get large. If your set is big enough the one with O(1) will outperform the one with O(N) or O(logN), but there's no way to tell what value of N is the break-even point, except by experiment.
Generally, I just use the simplest possible thing, and then if it becomes a "bottleneck", as indicated by operations on that data structure taking much percent of time, then I will switch to something with a better big-O rating. Quite often, either the number of items in the collection never comes near the break-even point, or there's another simple way to resolve the performance problem.
Both HashSet and LinkedHashSet have O(1) performance. Same with HashMap and LinkedHashMap (actually the former are implemented based on the later). This only tells you how these algorithms scale, not how they actually perform. In this case, LinkHashSet does all the same work as HashSet but also always has to update a previous and next pointer to maintain the order. This means that the constant (this is an important value also when talking about actual algorithm performance) for HashSet is lower than LinkHashSet.
Thus, since these two have the same Big-O, they scale the same essentially - that is, as n changes, both have the same performance change and with O(1) the performance, on average, does not change.
So now your choice is based on functionality and your requirements (which really should be what you consider first anyway). If you only need fast add and get operations, you should always pick HashSet. If you also need consistent ordering - such as last accessed or insertion order - then you must also use the Linked... version of the class.
I have used the "linked" class in production applications, well LinkedHashMap. I used this in one case for a symbol like table so wanted quick access to the symbols and related information. But I also wanted to output the information in at least one context in the order that the user defined those symbols (insertion order). This makes the output more friendly for the user since they can find things in the same order that they were defined.
If I had to sort millions of rows I'd try to find a different way. Maybe I could improve my SQL, improve my algorithm, or perhaps write the elements to disk and use the operating system's sort command.
I've never had a case where collections where the cause of my performance issues.
I created my own experimentation with HashSets and LinkedHashSets. For add() and contains the running time is O(1) , not taking into consideration for a lot of collisions. In the add() method for a linkedhashset, I put the object in a user created hash table which is O(1) and then put the object in a separate linkedlist to account for order. So the running time to remove an element from a linkedhashset, you must find the element in the hashtable and then search through the linkedlist that has the order. So the running time is O(1) + O(n) respectively which is o(n) for remove()
Anyone have a good rule of thumb for choosing between different implementations of Java Collection interfaces like List, Map, or Set?
For example, generally why or in what cases would I prefer to use a Vector or an ArrayList, a Hashtable or a HashMap?
I really like this cheat sheet from Sergiy Kovalchuk's blog entry, but unfortunately it is offline. However, the Wayback Machine has a historical copy:
More detailed was Alexander Zagniotov's flowchart, also offline therefor also a historical copy of the blog:
Excerpt from the blog on concerns raised in comments:
"This cheat sheet doesn't include rarely used classes like WeakHashMap, LinkedList, etc. because they are designed for very specific or exotic tasks and shouldn't be chosen in 99% cases."
I'll assume you know the difference between a List, Set and Map from the above answers. Why you would choose between their implementing classes is another thing. For example:
List:
ArrayList is quick on retrieving, but slow on inserting. It's good for an implementation that reads a lot but doesn't insert/remove a lot. It keeps its data in one continuous block of memory, so every time it needs to expand, it copies the whole array.
LinkedList is slow on retrieving, but quick on inserting. It's good for an implementation that inserts/removes a lot but doesn't read a lot. It doesn't keep the entire array in one continuous block of memory.
Set:
HashSet doesn't guarantee the order of iteration, and therefore is fastest of the sets. It has high overhead and is slower than ArrayList, so you shouldn't use it except for a large amount of data when its hashing speed becomes a factor.
TreeSet keeps the data ordered, therefore is slower than HashSet.
Map: The performance and behavior of HashMap and TreeMap are parallel to the Set implementations.
Vector and Hashtable should not be used. They are synchronized implementations, before the release of the new Collection hierarchy, thus slow. If synchronization is needed, use Collections.synchronizedCollection().
I've always made those decisions on a case by case basis, depending on the use case, such as:
Do I need the ordering to remain?
Will I have null key/values? Dups?
Will it be accessed by multiple threads
Do I need a key/value pair
Will I need random access?
And then I break out my handy 5th edition Java in a Nutshell and compare the ~20 or so options. It has nice little tables in Chapter five to help one figure out what is appropriate.
Ok, maybe if I know off the cuff that a simple ArrayList or HashSet will do the trick I won't look it all up. ;) but if there is anything remotely complex about my indended use, you bet I'm in the book. BTW, I though Vector is supposed to be 'old hat'--I've not used on in years.
Theoretically there are useful Big-Oh tradeoffs, but in practice these almost never matter.
In real-world benchmarks, ArrayList out-performs LinkedList even with big lists and with operations like "lots of insertions near the front." Academics ignore the fact that real algorithms have constant factors that can overwhelm the asymptotic curve. For example, linked-lists require an additional object allocation for every node, meaning slower to create a node and vastly worse memory-access characteristics.
My rule is:
Always start with ArrayList and HashSet and HashMap (i.e. not LinkedList or TreeMap).
Type declarations should always be an interface (i.e. List, Set, Map) so if a profiler or code review proves otherwise you can change the implementation without breaking anything.
About your first question...
List, Map and Set serve different purposes. I suggest reading about the Java Collections Framework at http://java.sun.com/docs/books/tutorial/collections/interfaces/index.html.
To be a bit more concrete:
use List if you need an array-like data structure and you need to iterate over the elements
use Map if you need something like a dictionary
use a Set if you only need to decide if something belongs to the set or not.
About your second question...
The main difference between Vector and ArrayList is that the former is synchronized, the latter is not synchronized. You can read more about synchronization in Java Concurrency in Practice.
The difference between Hashtable (note that the T is not a capital letter) and HashMap is similiar, the former is synchronized, the latter is not synchronized.
I would say that there are no rule of thumb for preferring one implementation or another, it really depends on your needs.
For non-sorted the best choice, more than nine times out of ten, will be: ArrayList, HashMap, HashSet.
Vector and Hashtable are synchronised and therefore might be a bit slower. It's rare that you would want synchronised implementations, and when you do their interfaces are not sufficiently rich for thier synchronisation to be useful. In the case of Map, ConcurrentMap adds extra operations to make the interface useful. ConcurrentHashMap is a good implementation of ConcurrentMap.
LinkedList is almost never a good idea. Even if you are doing a lot of insertions and removal, if you are using an index to indicate position then that requires iterating through the list to find the correct node. ArrayList is almost always faster.
For Map and Set, the hash variants will be faster than tree/sorted. Hash algortihms tend to have O(1) performance, whereas trees will be O(log n).
Lists allow duplicate items, while Sets allow only one instance.
I'll use a Map whenever I'll need to perform a lookup.
For the specific implementations, there are order-preserving variations of Maps and Sets but largely it comes down to speed. I'll tend to use ArrayList for reasonably small Lists and HashSet for reasonably small sets, but there are many implementations (including any that you write yourself). HashMap is pretty common for Maps. Anything more than 'reasonably small' and you have to start worrying about memory so that'll be way more specific algorithmically.
This page has lots of animated images along with sample code testing LinkedList vs. ArrayList if you're interested in hard numbers.
EDIT: I hope the following links demonstrate how these things are really just items in a toolbox, you just have to think about what your needs are: See Commons-Collections versions of Map, List and Set.
Well, it depends on what you need. The general guidelines are:
List is a collection where data is kept in order of insertion and each element got index.
Set is a bag of elements without duplication (if you reinsert the same element, it won't be added). Data doesn't have the notion of order.
Map You access and write your data elements by their key, which could be any possible object.
Attribution: https://stackoverflow.com/a/21974362/2811258
For more information about Java Collections, check out this article.
As suggested in other answers, there are different scenarios to use correct collection depending on use case. I am listing few points,
ArrayList:
Most cases where you just need to store or iterate through a "bunch of things" and later iterate through them. Iterating is faster as its index based.
Whenever you create an ArrayList, a fixed amount of memory is allocated to it and once exceeded, it copies the whole array
LinkedList:
It uses doubly linked list so insertion and deletion operation will be fast as it will only add or remove a node.
Retrieving is slow as it will have to iterate through the nodes.
HashSet:
Making other yes-no decisions about an item, e.g. "is the item a word of English", "is the item in the database?" , "is the item in this category?" etc.
Remembering "which items you've already processed", e.g. when doing a web crawl;
HashMap:
Used in cases where you need to say "for a given X, what is the Y"? It is often useful for implementing in-memory caches or indexes i.e key value pairs For example:
For a given user ID, what is their cached name/User object?.
Always go with HashMap to perform a lookup.
Vector and Hashtable are synchronized and therefore bit slower and If synchronization is needed, use Collections.synchronizedCollection().
Check This for sorted collections.
Hope this hepled.
I found Bruce Eckel's Thinking in Java to be very helpful. He compares the different collections very well. I used to keep a diagram he published showing the inheritance heirachy on my cube wall as a quick reference. One thing I suggest you do is keep in mind thread safety. Performance usually means not thread safe.
Use Map for key-value pairing
For key-value tracking, use Map implementation.
For example, tracking which person is covering which day of the weekend. So we want to map a DayOfWeek object to an Employee object.
Map < DayOfWeek , Employee > weekendWorker =
Map.of(
DayOfWeek.SATURDAY , alice ,
DayOfWeek.SUNDAY , bob
)
;
When choosing one of the Map implementations, there are several aspects to consider. These include: concurrency, tolerance for NULL values in key and/or value, order when iterating keys, tracking by reference versus content, and convenience of literals syntax.
Here is a chart I made showing the various aspects of each of the ten Map implementations bundled with Java 11.