I just read Is there a no-duplicate List implementation out there? answer about a List, NOT Set, implementation that doesn't allow duplicates. The accepted answer recommended the Collections15 SetUniqueList. Is there some equivalent -- in Guava perhaps? (I searched the docs and couldn't find) -- in other libs or is there some other currently popular solution?
I like to require the correct Java Semantic when I have a Collection that should not have any duplicates, and that is to use the Set interface. That said, in many cases you want to preserve the insertion order like a List would for convenience, maintaining two parallel data structures seems wasteful and complicated to keep in sync. That is why I use something like this. InsertionOrderSet.java This is a special implementation of SortedSet that uses a wrapper object to maintain and index that can be sorted on by a comparator, but hides that implementation detail from the external consumers so it just looks like a regular old type safe SortedSet.
The stuff in the answer you linked seems quite adequate. I'd go with one of those solutions. Alternatively, you could simply extend class ArrayList and override some of the methods that do a check against a HashSet prior to calling super.method for that same method. Simply add the Set as an instance field, use it for checking dupes and add/remove whatever items get added to/removed from the list.
Related
While I was solving a Java test I came up with the following question:
You need to store elements in a collection that guarantees that no
duplicates are stored and all elements can be accessed in natural
order. Which interface provides that capability?
A. java.util.Map
B. java.util.Set
C. java.util.List
D. java.util.Collection
I have no idea what is the right case here? We can store the same element in any of these collections unless in a Set, but the Set doesn't provide the natural order. What's wrong?
The correct answer for that test is Set Let's remember that it's asking for an interface that could provide that; given the right implementation, the Set interface could provide it.
The Map interface doesn't make any guarantees around what order things are stored, as that's implementation specific. However, if you use the right implementation (that is, TreeMap as spelled out by the docs), then you're guaranteed a natural ordering and no duplicate entries.
However, there's no requirement about key-value pairs.
The Set interface also doesn't make any guarantees around what order things are stored in, as that's implementation specific. But, like TreeMap, TreeSet is a set that can be used to store things in a natural order with no duplicates.
Here's how it'd look.
Set<String> values = new TreeSet<>();
The List interface will definitely allow duplicates, which instantly rules it out.
The Collection interface doesn't have anything directly implementing it, but it is the patriarch of the entire collections hierarchy. So, in theory, code like this is legal:
Collection<String> values = new TreeSet<>();
...but you'd lose information about what kind of collection it actually was, so I'd discourage its usage.
TreeSet would give you ordering (either natural ordering by default of custom ordering via a Comparator).
To be more general, SortedSet is the more general interface that offers uniqueness and ordering.
A Set that further provides a total ordering on its elements. The elements are ordered using their natural ordering, or by a Comparator typically provided at sorted set creation time. The set's iterator will traverse the set in ascending element order. Several additional operations are provided to take advantage of the ordering.
If by natural order, you mean order of insertion, then LinkedHashSet is your go to Set implementation.
The correct answers are:
SortedSet gives guarantees, regarding natural order of elements.
TreeSet is typical implementation
Strictly speaking, when choosing from the above List is the only of the interfaces that has a defined order of iteration, however it does allow duplicates.
Set and Map on the other hands, does not allow duplicates (of keys for Map), but they also do not define the order of iteration, they are unordered by default, with HashSet/HashMap being the counter example.
Collection allows none.
So, strictly speaking - none of the suggested interfaces provide the desired capability,
However, as others suggested, there are specific implementations of the interfaces that do allow natural order of elements and no duplicates, mainly the SortedSet interface and its TreeSet implementation
To further elaborate why Set is not a good option, if you have a variable, let it be mySet, and you want it to be ordered, users are going to be surprised when you use the Set interface, imaigine the following code:
public int processMyDataStructure(Set set) {
//some calculation that assumes set is ordered
return result;
}
and users provide you a HashSet as argument - you are going to get a wrong behavior from your method, because Set does not guarantee ordering. To avoid it you should have asked for an SortedSet rather than Set.
I've come over this question yesterday on my interview test and need to comment on that: the question (assuming one of the listed A, B, C or D answers has to be correct) is plainly wrong. There is no correct answer listed.
Nothing in Set interface guarantees the order in which the elements are to be returned. And there is no such thing, as Makoto would like it in his accepted answer, as right implementation that could theoretically do the job, because we are not asked for any implementation here, but whether interface provides the requested capability.
So, the test question with the answers provided is misguided.
Referring a bit more to accepted answer, there is one more reason for it to be wrong. Specifically, Makoto argues, that The List interface will definitely allow duplicates, which instantly rules it out. This argument may be defied by citation from List specification saying:
It is not inconceivable that someone might wish to implement a list that prohibits duplicates, by throwing runtime exceptions when the user attempts to insert them, but we expect this usage to be rare
so in my opinion, any of the answers given is equally wrong, or, as accepted answer wants it, equally correct, as we are free to write implementation of List (or Map, or Collection) behaving in any way we wish (in boundaries set by an interface specification), but interfaces and their specifications are here to guarantee some contract, and this question is really about them, not about possible implementations.
Collection list = new LinkedList(); // Good?
LinkedList list = new LinkedList(); // Bad?
First variant gives more flexibility, but is that all? Are there any other reasons to prefer it? What about performance?
These are design decisions, and one size usually doesn't fit all. Also the choice of what is used internally for the member variable can (and usually should be) different from what is exposed to the outside world.
At its heart, Java's collections framework does not provide a complete set of interfaces that describe the performance characteristics without exposing the implementation details. The one interface that describes performance, RandomAccess is a marker interface, and doesn't even extend Collection or re-expose the get(index) API. So I don't think there is a good answer.
As a rule of thumb, I keep the type as unspecific as possible until I recognize (and document) some characteristic that is important. For example, as soon as I want methods to know that insertion order is retained, I would change from Collection to List, and document why that restriction is important. Similarly, move from List to LinkedList if say efficient removal from front becomes important.
When it comes to exposing the collection in public APIs, I always try to start exposing just the few APIs that are expected to get used; for example add(...) and iterator().
Collection list = new LinkedList(); //bad
This is bad because, you don't want this reference to refer say an HashSet(as HashSet also implements Collection and so does many other class's in the collection framework).
LinkedList list = new LinkedList(); //bad?
This is bad because, good practice is to always code to the interface.
List list = new LinkedList();//good
This is good because point 2 days so.(Always Program To an Interface)
Use the most specific type information on non-public objects. They are implementation details, and we want our implementation details as specific and precise as possible.
Sure. If for example java will find and implement more efficient implementation for the List collection, but you already have API that accepts only LinkedList, you won't be able to replace the implementation if you already have clients for this API. If you use interface, you can easily replace the implementation without breaking the APIs.
They're absolutely equivalent. The only reason to use one over the other is that if you later want to use a function of list that only exists in the class LinkedList, you need to use the second.
My general rule is to only be as specific as you need to be at the time (or will need to be in the near future, within reason). Granted, this is somewhat subjective.
In your example I would usually declare it as a List just because the methods available on Collection aren't very powerful, and the distinction between a List and another Collection (Map, Set, etc.) is often logically significant.
Also, in Java 1.5+ don't use raw types -- if you don't know the type that your list will contain, at least use List<?>.
How come Java provides several different implementations of the Set type, including HashSet and TreeSet and not ArraySet?
A set based solely on an array of elements in no particular order would always have O(n) time for a containment check. It wouldn't be terribly useful, IMO. When would you want to use that instead of HashSet or TreeSet?
The most useful aspect of an array is that you can get to an element with a particular index extremely quickly. That's not terribly relevant when it comes to sets.
There is CopyOnWriteArraySet which is a set backed by an array.
This is not particularly useful as its performance is not great for large collections.
Android has android.util.ArraySet (introduced in API level 23) and android.util.ArrayMap (introduced in API level 19).
Actually the concrete implementation of Set does not make any sense. Any set stores elements and guaranties their uniqueness.
I cannot be sure but it sounds that you want Set implementation that preserves order of elements. If I am right use LinkedHashSet.
Java provides multiple implementations of its Collection Interfaces that allow for best performance. ArrayList performs good on many List operations.
For Set Operations, which allways require uniquness different implementations offer better performance. If implemented using an array, any modification operation would have to run through all the array elements to check if it is allready in the Set. HashSet and TreeSet simplyfy this check greatly.
The Set interface has no get-by-index method, such as List.get(int), so there's no use suggesting Set can have array like properties.
Ultimately, all "grouping" classes use arrays under the hood to store their elements, but that doesn't mean you have to expose methods for accessing the array.
You can always implement it yourself....now granted there probably is only one extremely, extremely limited case where it would be useful(and in that case you could use better data structures anyway) and that is where you have a very large set that almost never changes then an array set would take up SLIGHTLY less memory(no extra pointers) and you would have ever so slightly faster enumeration of the whole set... If you keep the array sorted then you can still get O(lg n) search time.
However those differences are purely academic. In the real world you would never really want such a beast
Consider indexed-tree-map , you will be able to access elements by index and get index of elements while keeping the sort order. Duplicates can be put into arrays as values under the same key.
I am looking for a way to determine if a Collection (or maybe even any Iterable) is guaranteed to be ordered by its class contract.
I already know the Guava method : Ordering.natural().isOrdered(myCollection)
But this method is not relevant to my needs, because it checks if the values inside the collection are ordered. That's not what I need to determine, what I want to have is a isSorted method that will behave like this :
isSorted(new HashSet()) -> false
isSorted(new ArrayList()) -> true
etc...
What I am looking at would be typically implemented by checking the class of the collection, and comparing it to some kind of reference table of the collections which contract states that they are ordered, and only return true for these ones.
Do you know if something like this already exists in some library ?
You can do the following to determine if a collection is defined to be sorted.
collection instanceof SortedSet
There are three interfaces for ordered collections: List, SortedSet and SortedMap. You can check if your class is implementing one of them.
No, this doesn't exist in any library, and for good reason.
That library would have to know all the collection types that are flying around. If you're using Apache Commons Collections, it'd have to know about all of those. If you're using Guava, it'd have to know about all of those. If someone comes along and introduces a new collection type, you're now going to reject that type, even if it's ordered.
It doesn't make sense to provide that method in a library that can't know what other libraries you might have with whatever other collection types might be out there.
In an end application, it might make sense to implement it, with the heuristic techniques you've already been describing.
It might help if we knew what you were actually trying to do with this method.
Does anyone know of any resources or books I can read to help me understand the various Java collection classes?
For example:When would one use a Collection<T> instead of a List<T>
and when would you use a Map<T, V> instead of aList<V>, where V has a member getId as shown below, allowing you to search the list for the element matching a given key:
class V {
T getId();
}
thanks
You use a Map if you want a map-like behaviour. You use a List if you want a list-like behaviour. You use a Collection if you don't care.
Generics have nothing to do with this.
See the Collections tutorial.
You can take a look at sun tutorial. It explains everything in detail.
http://java.sun.com/docs/books/tutorial/collections/index.html (Implementation section explain the difference between them)
This book is very good and covers both the collection framework and generics.
You can check the documentation of the java collection API.
Anyway a basic rule is : be as generic as possible for the type of your parameters. Be as generic as possible for the return type of your interfaces. Be as specific as possible for the return type of your final class.
A good place to start would be the Java API. Each collection has a good description associated with it. After that, you can search for any variety of articles and books on Java Collections on Google.
The decision depends on your data and your needs to use the data.
You should use a map if you have data where you can identify each element with a specific key and want to access or find it by with this key.
You take a List if you don't have a key but you're interested in the order of the elements. like a bunch of Strings you want to store in the order the user entered it.
You take a Set if you don't want to store the same element twice.
Also interesting for your decision is if you're working in am multithreaded environment. So if many threads are accessing the same list at the same tame you would rather take a Vector instead of an ArrayList.
Btw. for some collections it is usefull if your data class implements an interface like comparable or at least overrides the equals function.
here you will find more information.
Most Java books will have a good expanation of the Collections Framework. I find that Object-Oriented-Software-Development-Using has a good chapter that expains the reasons why one Collection is selected over another.
The Head first Java also has a good intropduction but may not tackle the problem of which to select.
The answer to your question is how are you going to be using the data structure? And to get a better idea of the possibilities, it is good to look at the whole collections interfaces hierarchy. For simplicity sake, I am restricting this discussion only to the classic interfaces, and am ignoring all of the concurrent interfaces.
Collection
+- List
+- Set
+- SortedSet
Map
+- SortedMap
So, we can see from the above, a Map and a Collection are different things.
A Collection is somewhat analogous to a bag, it contains a number of values, but makes no further guarantees about them. A list is simply an ordered set of values, where the order is defined externally, not implicitly from the values themselves. A Set on the other hand is a group of values, no two of which are the same, however they are not ordered, neither explicitly, nor implicitly. A SortedSet is a set of unique values that are implicitly sorted, that is, if you iterate over the values, then they will always be returned in the same order.
A Map is mapping from a Set of keys to values. A SortedMap is a mapping from a SortedSet of keys to values.
As to why you would want to use a Map instead of a List? This depends largely on how you need to lookup your data. If you need to do (effectively) random lookups using a key, then you should be using a set, since the implementations of that give you either O(1) or O(lgn) lookups. Searching through the list is O(n). If however, you are performing some kind of "batch" process, that is you are processing each, and every, item in the list then a list, or Set if you need the uniqueness constraint, is more appropriate.
The other answers already covered an overview of what the collections are, so I'd add one rule of thumb that applies to how you might use collections in your programming:
Be strict in what you send, but generous in what you receive
This is a little controversial (some engineers believe that you should always be as strict as possible) but it's a rule of thumb that, when applied to collections, teaches us to pick the collection that limits your users the least when taking arguments but gives as much information as possible when returning results.
In other words a method signature like:
LinkedList< A > doSomething(Collection< A > col);
Might be preferred to:
Collection< A > doSomething(LinkedList< A > list);
In version 1, your user doesn't have to massage their data to use your method. They can pass you an ArrayList< A >, LinkedHashSet< A > or a Collection< A > and you will deal with. On receiving the data from the method, they have a lot more information in what they can do with it (list specific iterators for example) than they would in option 2.