Does anyone know of any resources or books I can read to help me understand the various Java collection classes?
For example:When would one use a Collection<T> instead of a List<T>
and when would you use a Map<T, V> instead of aList<V>, where V has a member getId as shown below, allowing you to search the list for the element matching a given key:
class V {
T getId();
}
thanks
You use a Map if you want a map-like behaviour. You use a List if you want a list-like behaviour. You use a Collection if you don't care.
Generics have nothing to do with this.
See the Collections tutorial.
You can take a look at sun tutorial. It explains everything in detail.
http://java.sun.com/docs/books/tutorial/collections/index.html (Implementation section explain the difference between them)
This book is very good and covers both the collection framework and generics.
You can check the documentation of the java collection API.
Anyway a basic rule is : be as generic as possible for the type of your parameters. Be as generic as possible for the return type of your interfaces. Be as specific as possible for the return type of your final class.
A good place to start would be the Java API. Each collection has a good description associated with it. After that, you can search for any variety of articles and books on Java Collections on Google.
The decision depends on your data and your needs to use the data.
You should use a map if you have data where you can identify each element with a specific key and want to access or find it by with this key.
You take a List if you don't have a key but you're interested in the order of the elements. like a bunch of Strings you want to store in the order the user entered it.
You take a Set if you don't want to store the same element twice.
Also interesting for your decision is if you're working in am multithreaded environment. So if many threads are accessing the same list at the same tame you would rather take a Vector instead of an ArrayList.
Btw. for some collections it is usefull if your data class implements an interface like comparable or at least overrides the equals function.
here you will find more information.
Most Java books will have a good expanation of the Collections Framework. I find that Object-Oriented-Software-Development-Using has a good chapter that expains the reasons why one Collection is selected over another.
The Head first Java also has a good intropduction but may not tackle the problem of which to select.
The answer to your question is how are you going to be using the data structure? And to get a better idea of the possibilities, it is good to look at the whole collections interfaces hierarchy. For simplicity sake, I am restricting this discussion only to the classic interfaces, and am ignoring all of the concurrent interfaces.
Collection
+- List
+- Set
+- SortedSet
Map
+- SortedMap
So, we can see from the above, a Map and a Collection are different things.
A Collection is somewhat analogous to a bag, it contains a number of values, but makes no further guarantees about them. A list is simply an ordered set of values, where the order is defined externally, not implicitly from the values themselves. A Set on the other hand is a group of values, no two of which are the same, however they are not ordered, neither explicitly, nor implicitly. A SortedSet is a set of unique values that are implicitly sorted, that is, if you iterate over the values, then they will always be returned in the same order.
A Map is mapping from a Set of keys to values. A SortedMap is a mapping from a SortedSet of keys to values.
As to why you would want to use a Map instead of a List? This depends largely on how you need to lookup your data. If you need to do (effectively) random lookups using a key, then you should be using a set, since the implementations of that give you either O(1) or O(lgn) lookups. Searching through the list is O(n). If however, you are performing some kind of "batch" process, that is you are processing each, and every, item in the list then a list, or Set if you need the uniqueness constraint, is more appropriate.
The other answers already covered an overview of what the collections are, so I'd add one rule of thumb that applies to how you might use collections in your programming:
Be strict in what you send, but generous in what you receive
This is a little controversial (some engineers believe that you should always be as strict as possible) but it's a rule of thumb that, when applied to collections, teaches us to pick the collection that limits your users the least when taking arguments but gives as much information as possible when returning results.
In other words a method signature like:
LinkedList< A > doSomething(Collection< A > col);
Might be preferred to:
Collection< A > doSomething(LinkedList< A > list);
In version 1, your user doesn't have to massage their data to use your method. They can pass you an ArrayList< A >, LinkedHashSet< A > or a Collection< A > and you will deal with. On receiving the data from the method, they have a lot more information in what they can do with it (list specific iterators for example) than they would in option 2.
Related
While I was solving a Java test I came up with the following question:
You need to store elements in a collection that guarantees that no
duplicates are stored and all elements can be accessed in natural
order. Which interface provides that capability?
A. java.util.Map
B. java.util.Set
C. java.util.List
D. java.util.Collection
I have no idea what is the right case here? We can store the same element in any of these collections unless in a Set, but the Set doesn't provide the natural order. What's wrong?
The correct answer for that test is Set Let's remember that it's asking for an interface that could provide that; given the right implementation, the Set interface could provide it.
The Map interface doesn't make any guarantees around what order things are stored, as that's implementation specific. However, if you use the right implementation (that is, TreeMap as spelled out by the docs), then you're guaranteed a natural ordering and no duplicate entries.
However, there's no requirement about key-value pairs.
The Set interface also doesn't make any guarantees around what order things are stored in, as that's implementation specific. But, like TreeMap, TreeSet is a set that can be used to store things in a natural order with no duplicates.
Here's how it'd look.
Set<String> values = new TreeSet<>();
The List interface will definitely allow duplicates, which instantly rules it out.
The Collection interface doesn't have anything directly implementing it, but it is the patriarch of the entire collections hierarchy. So, in theory, code like this is legal:
Collection<String> values = new TreeSet<>();
...but you'd lose information about what kind of collection it actually was, so I'd discourage its usage.
TreeSet would give you ordering (either natural ordering by default of custom ordering via a Comparator).
To be more general, SortedSet is the more general interface that offers uniqueness and ordering.
A Set that further provides a total ordering on its elements. The elements are ordered using their natural ordering, or by a Comparator typically provided at sorted set creation time. The set's iterator will traverse the set in ascending element order. Several additional operations are provided to take advantage of the ordering.
If by natural order, you mean order of insertion, then LinkedHashSet is your go to Set implementation.
The correct answers are:
SortedSet gives guarantees, regarding natural order of elements.
TreeSet is typical implementation
Strictly speaking, when choosing from the above List is the only of the interfaces that has a defined order of iteration, however it does allow duplicates.
Set and Map on the other hands, does not allow duplicates (of keys for Map), but they also do not define the order of iteration, they are unordered by default, with HashSet/HashMap being the counter example.
Collection allows none.
So, strictly speaking - none of the suggested interfaces provide the desired capability,
However, as others suggested, there are specific implementations of the interfaces that do allow natural order of elements and no duplicates, mainly the SortedSet interface and its TreeSet implementation
To further elaborate why Set is not a good option, if you have a variable, let it be mySet, and you want it to be ordered, users are going to be surprised when you use the Set interface, imaigine the following code:
public int processMyDataStructure(Set set) {
//some calculation that assumes set is ordered
return result;
}
and users provide you a HashSet as argument - you are going to get a wrong behavior from your method, because Set does not guarantee ordering. To avoid it you should have asked for an SortedSet rather than Set.
I've come over this question yesterday on my interview test and need to comment on that: the question (assuming one of the listed A, B, C or D answers has to be correct) is plainly wrong. There is no correct answer listed.
Nothing in Set interface guarantees the order in which the elements are to be returned. And there is no such thing, as Makoto would like it in his accepted answer, as right implementation that could theoretically do the job, because we are not asked for any implementation here, but whether interface provides the requested capability.
So, the test question with the answers provided is misguided.
Referring a bit more to accepted answer, there is one more reason for it to be wrong. Specifically, Makoto argues, that The List interface will definitely allow duplicates, which instantly rules it out. This argument may be defied by citation from List specification saying:
It is not inconceivable that someone might wish to implement a list that prohibits duplicates, by throwing runtime exceptions when the user attempts to insert them, but we expect this usage to be rare
so in my opinion, any of the answers given is equally wrong, or, as accepted answer wants it, equally correct, as we are free to write implementation of List (or Map, or Collection) behaving in any way we wish (in boundaries set by an interface specification), but interfaces and their specifications are here to guarantee some contract, and this question is really about them, not about possible implementations.
So, I know that coding to an interface (using an interface as a variable's declared type instead of its concrete type) is a good practice in OO code, for a bunch of reasons. This is seen a lot, for example, with Java collections. Well, is referring to an interface in your program still a good thing to do when only certain implementations of that interface provide correct behavior?
For example, I have a Java program. In that program, I have multiple sets of objects. I chose to use a Set, because I didn't want duplicate elements. However, I wanted a list's ordering property (i.e. maintain insertion order). Therefore, I am using a LinkedHashSet as the concrete Set type. One thing these sets are used for is computing a dot product involving the primitive fields of the objects contained in the sets, such as in (simplifying a bit):
double dot(LinkedHashSet<E> set, double[] array) {
double sum = 0.0;
int i = 0;
for(E element : set) {
sum += (element.getValue()*array[i]);
}
return sum;
}
This method's result is dependent on the set's iteration order, and so certain Set implementations, mainly HashSet, will give incorrect/unexpected results. Currently, I am using LinkedHashSet throughout my program as the declared type, instead of Set, to ensure correct behavior. However, that feels bad stylistically. What's the right thing to do here? Is it okay to use the concrete type in this case? Or maybe should I use Set as the type, but then state in the documentation which implementations will/won't produce correct behavior? I'm looking more for general input than anything specific to the scenario above. In particular, this should apply to really any scenario where you're using the ordering properties of a LinkedHashSet or TreeSet. How do you prevent unintended implementations from being used? Do you force it in the code (by ditching the interface), or do you specify it in the documentation? Or perhaps some other approach?
It is true that you should code to interfaces, but only if the assurances they make fit your needs. In your case, if you would only use Set then you are saying: I don't want duplicates, but I don't care about the order. You could also use a List and mean: I care about insertion order, but not about duplicates. There even is a SortedSet but it does not have the ordering you want. So in your case you can't replace LinkedHashSet by one of its interfaces without violating the Liskov substitution principle.
So I would argue that in your case you should stick to the implementation until you really need the to switch to another implementation. With modern IDEs refactoring is not that hard anymore so I would refrain from doing any premature optimizations -- YAGNI and KISS.
Very very great question. One solution is: Make another interface! Say one that extends SortedMap but has a getInsertionOrderIterator() method or an interface that extends Map & has getOrderIterator() & getInsertionOrderIterator() methods.
You can write a quick adapter class that contains a LinkedHashMap & TreeMap as the backend data structures.
You can make arguments for either way. As long as you and others maintaining this code know that particular implementations of Set might break the rest of the app or library, then coding to the interface is fine. However, if that is not true, then you should use the specific implementation.
The purpose of coding to an interface is to give you flexibility that will not break your app. Take JDBC for instance. If you use the wrong driver it will break your program similar to how you are describing here. However, if let's say Oracle decided to put behavior in their JDBC driver that subtly broke code written to the JDBC spec instead of the specific Oracle driver code then you'd have to choose.
There is no cut and dry, "this is always right" type of answer.
I am looking for a way to determine if a Collection (or maybe even any Iterable) is guaranteed to be ordered by its class contract.
I already know the Guava method : Ordering.natural().isOrdered(myCollection)
But this method is not relevant to my needs, because it checks if the values inside the collection are ordered. That's not what I need to determine, what I want to have is a isSorted method that will behave like this :
isSorted(new HashSet()) -> false
isSorted(new ArrayList()) -> true
etc...
What I am looking at would be typically implemented by checking the class of the collection, and comparing it to some kind of reference table of the collections which contract states that they are ordered, and only return true for these ones.
Do you know if something like this already exists in some library ?
You can do the following to determine if a collection is defined to be sorted.
collection instanceof SortedSet
There are three interfaces for ordered collections: List, SortedSet and SortedMap. You can check if your class is implementing one of them.
No, this doesn't exist in any library, and for good reason.
That library would have to know all the collection types that are flying around. If you're using Apache Commons Collections, it'd have to know about all of those. If you're using Guava, it'd have to know about all of those. If someone comes along and introduces a new collection type, you're now going to reject that type, even if it's ordered.
It doesn't make sense to provide that method in a library that can't know what other libraries you might have with whatever other collection types might be out there.
In an end application, it might make sense to implement it, with the heuristic techniques you've already been describing.
It might help if we knew what you were actually trying to do with this method.
What is the need of Collection framework in Java since all the data operations(sorting/adding/deleting) are possible with Arrays and moreover array is suitable for memory consumption and performance is also better compared with Collections.
Can anyone point me a real time data oriented example which shows the difference in both(array/Collections) of these implementations.
Arrays are not resizable.
Java Collections Framework provides lots of different useful data types, such as linked lists (allows insertion anywhere in constant time), resizeable array lists (like Vector but cooler), red-black trees, hash-based maps (like Hashtable but cooler).
Java Collections Framework provides abstractions, so you can refer to a list as a List, whether backed by an array list or a linked list; and you can refer to a map/dictionary as a Map, whether backed by a red-black tree or a hashtable.
In other words, Java Collections Framework allows you to use the right data structure, because one size does not fit all.
Several reasons:
Java's collection classes provides a higher level interface than arrays.
Arrays have a fixed size. Collections (see ArrayList) have a flexible size.
Efficiently implementing a complicated data structures (e.g., hash tables) on top of raw arrays is a demanding task. The standard HashMap gives you that for free.
There are different implementation you can choose from for the same set of services: ArrayList vs. LinkedList, HashMap vs. TreeMap, synchronized, etc.
Finally, arrays allow covariance: setting an element of an array is not guaranteed to succeed due to typing errors that are detectable only at run time. Generics prevent this problem in arrays.
Take a look at this fragment that illustrates the covariance problem:
String[] strings = new String[10];
Object[] objects = strings;
objects[0] = new Date(); // <- ArrayStoreException: java.util.Date
Collection classes like Set, List, and Map implementations are closer to the "problem space." They allow developers to complete work more quickly and turn in more readable/maintainable code.
For each class in the Collections API there's a different answer to your question. Here are a few examples.
LinkedList: If you remove an element from the middle of an array, you pay the cost of moving all of the elements to the right of the removed element. Not so with a linked list.
Set: If you try to implement a set with an array, adding an element or testing for an element's presence is O(N). With a HashSet, it's O(1).
Map: To implement a map using an array would give the same performance characteristics as your putative array implementation of a set.
It depends upon your application's needs. There are so many types of collections, including:
HashSet
ArrayList
HashMap
TreeSet
TreeMap
LinkedList
So for example, if you need to store key/value pairs, you will have to write a lot of custom code if it will be based off an array - whereas the Hash* collections should just work out of the box. As always, pick the right tool for the job.
Well the basic premise is "wrong" since Java included the Dictionary class since before interfaces existed in the language...
collections offer Lists which are somewhat similar to arrays, but they offer many more things that are not. I'll assume you were just talking about List (and even Set) and leave Map out of it.
Yes, it is possible to get the same functionality as List and Set with an array, however there is a lot of work involved. The whole point of a library is that users do not have to "roll their own" implementations of common things.
Once you have a single implementation that everyone uses it is easier to justify spending resources optimizing it as well. That means when the standard collections are sped up or have their memory footprint reduced that all applications using them get the improvements for free.
A single interface for each thing also simplifies every developers learning curve - there are not umpteen different ways of doing the same thing.
If you wanted to have an array that grows over time you would probably not put the growth code all over your classes, but would instead write a single utility method to do that. Same for deletion and insertion etc...
Also, arrays are not well suited to insertion/deletion, especially when you expect that the .length member is supposed to reflect the actual number of contents, so you would spend a huge amount of time growing and shrinking the array. Arrays are also not well suited for Sets as you would have to iterate over the entire array each time you wanted to do an insertion to check for duplicates. That would kill any perceived efficiency.
Arrays are not efficient always. What if you need something like LinkedList? Looks like you need to learn some data structure : http://en.wikipedia.org/wiki/List_of_data_structures
Java Collections came up with different functionality,usability and convenience.
When in an application we want to work on group of Objects, Only ARRAY can not help us,Or rather they might leads to do things with some cumbersome operations.
One important difference, is one of usability and convenience, especially given that Collections automatically expand in size when needed:
Collections came up with methods to simplify our work.
Each one has a unique feature:
List- Essentially a variable-size array;
You can usually add/remove items at any arbitrary position;
The order of the items is well defined (i.e. you can say what position a given item goes in in the list).
Used- Most cases where you just need to store or iterate through a "bunch of things" and later iterate through them.
Set- Things can be "there or not"— when you add items to a set, there's no notion of how many times the item was added, and usually no notion of ordering.
Used- Remembering "which items you've already processed", e.g. when doing a web crawl;
Making other yes-no decisions about an item, e.g. "is the item a word of English", "is the item in the database?" , "is the item in this category?" etc.
Here you find use of each collection as per scenario:
Collection is the framework in Java and you know that framework is very easy to use rather than implementing and then use it and your concern is that why we don't use the array there are drawbacks of array like it is static you have to define the size of row at least in beginning, so if your array is large then it would result primarily in wastage of large memory.
So you can prefer ArrayList over it which is inside the collection hierarchy.
Complexity is other issue like you want to insert in array then you have to trace it upto define index so over it you can use LinkedList all functions are implemented only you need to use and became your code less complex and you can read there are various advantages of collection hierarchy.
Collection framework are much higher level compared to Arrays and provides important interfaces and classes that by using them we can manage groups of objects with a much sophisticated way with many methods already given by the specific collection.
For example:
ArrayList - It's like a dynamic array i.e. we don't need to declare its size, it grows as we add elements to it and it shrinks as we remove elements from it, during the runtime of the program.
LinkedList - It can be used to depict a Queue(FIFO) or even a Stack(LIFO).
HashSet - It stores its element by a process called hashing. The order of elements in HashSet is not guaranteed.
TreeSet - TreeSet is the best candidate when one needs to store a large number of sorted elements and their fast access.
ArrayDeque - It can also be used to implement a first-in, first-out(FIFO) queue or a last-in, first-out(LIFO) queue.
HashMap - HashMap stores the data in the form of key-value pairs, where key and value are objects.
Treemap - TreeMap stores key-value pairs in a sorted ascending order and retrieval speed of an element out of a TreeMap is quite fast.
To learn more about Java collections, check out this article.
Under what circumstances is an enum more appropriate than, for example, a Collection that guarantees unique elements (an implementer of java.util.Set, I guess...)?
(This is kind of a follow up from my previous question)
Basically when it's a well-defined, fixed set of values which are known at compile-time.
You can use an enum as a set very easily (with EnumSet) and it allows you to define behaviour, reference the elements by name, switch on them etc.
When the elements are known up front and won't change, an enum is appropriate.
If the elements can change during runtime, use a Set.
I am no java guru, but my guess is to use enumeration when you want to gurantee a certain pool of values, and to use a collection when you want to gurantee uniqueness. Example would be to enumerate days of the week (cant have "funday") and to have a collection of SSN (generic example i know!)
Great responses - I'll try and summarise, if just for my own reference - it kinda looks like you should use enums in two situations:
All the values you need are known at compile time, and either or both of the following:
you want better performance than your usual collection implementations
you want to limit the potential values to those specified at compile time
With the Collection over enumeration links that Jon gave, you can get the benefits of enum performance and safety as an implementation detail without incorporating it into your overall design.
Community wiki'd, please do edit and improve if you want to!
Note: you can have both with an EnumSet.
In some situations your business requires the creation of new items, but at the same time business logic based on some fixed items. For the fixed ones you want an enum, the new ones obviously require some kind of collection/db.
I've seen projects using a collection for this kind of items, resulting in business logic depending on data which can be deleted by the user. Never do this, but do create a separate enum for the fixed ones and a collection for the others, just as required.
An other solution is to use a collection with immutable objects for the fixed values. These items could also reside in a db, but have an extra flag so users cannot update / delete it.