To store unique element in a collection with natural order

To store unique element in a collection with natural order - java

While I was solving a Java test I came up with the following question:
You need to store elements in a collection that guarantees that no
duplicates are stored and all elements can be accessed in natural
order. Which interface provides that capability?
A. java.util.Map
B. java.util.Set
C. java.util.List
D. java.util.Collection
I have no idea what is the right case here? We can store the same element in any of these collections unless in a Set, but the Set doesn't provide the natural order. What's wrong?

The correct answer for that test is Set Let's remember that it's asking for an interface that could provide that; given the right implementation, the Set interface could provide it.
The Map interface doesn't make any guarantees around what order things are stored, as that's implementation specific. However, if you use the right implementation (that is, TreeMap as spelled out by the docs), then you're guaranteed a natural ordering and no duplicate entries.
However, there's no requirement about key-value pairs.
The Set interface also doesn't make any guarantees around what order things are stored in, as that's implementation specific. But, like TreeMap, TreeSet is a set that can be used to store things in a natural order with no duplicates.
Here's how it'd look.
Set<String> values = new TreeSet<>();
The List interface will definitely allow duplicates, which instantly rules it out.
The Collection interface doesn't have anything directly implementing it, but it is the patriarch of the entire collections hierarchy. So, in theory, code like this is legal:
Collection<String> values = new TreeSet<>();
...but you'd lose information about what kind of collection it actually was, so I'd discourage its usage.

TreeSet would give you ordering (either natural ordering by default of custom ordering via a Comparator).
To be more general, SortedSet is the more general interface that offers uniqueness and ordering.
A Set that further provides a total ordering on its elements. The elements are ordered using their natural ordering, or by a Comparator typically provided at sorted set creation time. The set's iterator will traverse the set in ascending element order. Several additional operations are provided to take advantage of the ordering.

If by natural order, you mean order of insertion, then LinkedHashSet is your go to Set implementation.
The correct answers are:
SortedSet gives guarantees, regarding natural order of elements.
TreeSet is typical implementation

Strictly speaking, when choosing from the above List is the only of the interfaces that has a defined order of iteration, however it does allow duplicates.
Set and Map on the other hands, does not allow duplicates (of keys for Map), but they also do not define the order of iteration, they are unordered by default, with HashSet/HashMap being the counter example.
Collection allows none.
So, strictly speaking - none of the suggested interfaces provide the desired capability,
However, as others suggested, there are specific implementations of the interfaces that do allow natural order of elements and no duplicates, mainly the SortedSet interface and its TreeSet implementation
To further elaborate why Set is not a good option, if you have a variable, let it be mySet, and you want it to be ordered, users are going to be surprised when you use the Set interface, imaigine the following code:
public int processMyDataStructure(Set set) {
//some calculation that assumes set is ordered
return result;
}
and users provide you a HashSet as argument - you are going to get a wrong behavior from your method, because Set does not guarantee ordering. To avoid it you should have asked for an SortedSet rather than Set.

I've come over this question yesterday on my interview test and need to comment on that: the question (assuming one of the listed A, B, C or D answers has to be correct) is plainly wrong. There is no correct answer listed.
Nothing in Set interface guarantees the order in which the elements are to be returned. And there is no such thing, as Makoto would like it in his accepted answer, as right implementation that could theoretically do the job, because we are not asked for any implementation here, but whether interface provides the requested capability.
So, the test question with the answers provided is misguided.
Referring a bit more to accepted answer, there is one more reason for it to be wrong. Specifically, Makoto argues, that The List interface will definitely allow duplicates, which instantly rules it out. This argument may be defied by citation from List specification saying:
It is not inconceivable that someone might wish to implement a list that prohibits duplicates, by throwing runtime exceptions when the user attempts to insert them, but we expect this usage to be rare
so in my opinion, any of the answers given is equally wrong, or, as accepted answer wants it, equally correct, as we are free to write implementation of List (or Map, or Collection) behaving in any way we wish (in boundaries set by an interface specification), but interfaces and their specifications are here to guarantee some contract, and this question is really about them, not about possible implementations.

Related

Whether or not to code to an interface when only certain implementations provide correct behavior

So, I know that coding to an interface (using an interface as a variable's declared type instead of its concrete type) is a good practice in OO code, for a bunch of reasons. This is seen a lot, for example, with Java collections. Well, is referring to an interface in your program still a good thing to do when only certain implementations of that interface provide correct behavior?
For example, I have a Java program. In that program, I have multiple sets of objects. I chose to use a Set, because I didn't want duplicate elements. However, I wanted a list's ordering property (i.e. maintain insertion order). Therefore, I am using a LinkedHashSet as the concrete Set type. One thing these sets are used for is computing a dot product involving the primitive fields of the objects contained in the sets, such as in (simplifying a bit):
double dot(LinkedHashSet<E> set, double[] array) {
double sum = 0.0;
int i = 0;
for(E element : set) {
sum += (element.getValue()*array[i]);
}
return sum;
}
This method's result is dependent on the set's iteration order, and so certain Set implementations, mainly HashSet, will give incorrect/unexpected results. Currently, I am using LinkedHashSet throughout my program as the declared type, instead of Set, to ensure correct behavior. However, that feels bad stylistically. What's the right thing to do here? Is it okay to use the concrete type in this case? Or maybe should I use Set as the type, but then state in the documentation which implementations will/won't produce correct behavior? I'm looking more for general input than anything specific to the scenario above. In particular, this should apply to really any scenario where you're using the ordering properties of a LinkedHashSet or TreeSet. How do you prevent unintended implementations from being used? Do you force it in the code (by ditching the interface), or do you specify it in the documentation? Or perhaps some other approach?

It is true that you should code to interfaces, but only if the assurances they make fit your needs. In your case, if you would only use Set then you are saying: I don't want duplicates, but I don't care about the order. You could also use a List and mean: I care about insertion order, but not about duplicates. There even is a SortedSet but it does not have the ordering you want. So in your case you can't replace LinkedHashSet by one of its interfaces without violating the Liskov substitution principle.
So I would argue that in your case you should stick to the implementation until you really need the to switch to another implementation. With modern IDEs refactoring is not that hard anymore so I would refrain from doing any premature optimizations -- YAGNI and KISS.

Very very great question. One solution is: Make another interface! Say one that extends SortedMap but has a getInsertionOrderIterator() method or an interface that extends Map & has getOrderIterator() & getInsertionOrderIterator() methods.
You can write a quick adapter class that contains a LinkedHashMap & TreeMap as the backend data structures.

You can make arguments for either way. As long as you and others maintaining this code know that particular implementations of Set might break the rest of the app or library, then coding to the interface is fine. However, if that is not true, then you should use the specific implementation.
The purpose of coding to an interface is to give you flexibility that will not break your app. Take JDBC for instance. If you use the wrong driver it will break your program similar to how you are describing here. However, if let's say Oracle decided to put behavior in their JDBC driver that subtly broke code written to the JDBC spec instead of the specific Oracle driver code then you'd have to choose.
There is no cut and dry, "this is always right" type of answer.

Two java.util.Iterators to the same collection: do they have to return elements in the same order?

This is more of a theoretical question. If I have an arbitrary collection c that isn't ordered and I obtain two java.util.Iterators by calling c.iterator() twice, do both iterators have to return c's elements in the same order?
I mean, in practice they probably always will, but are they forced to do so by contract?
Thanks,
Jan

No they are not.
"There are no guarantees concerning the order in which the elements are
returned (unless this collection is an instance of some class that
provides a guarantee)."
See the Collection#iterator api contract.
That includes from one iterator to the next (as it doesn't say anything about requiring that).
Also consider that something could have changed in the underlying collection between getting those two iterators! Something added or removed.

Implementation of Iterators are provided by the specific Collection class. Iterator for List will give the ordered element while Set will not

Because most Data structures are not ordered by default so it is not certain that they will iterate in same order.
If you want same order you have to sort the collection first.

ConcurrentSkipListSet and re-sorting (java)

I am using a ConcurrentSkipListSet, that is obviously accessed through multiple threads. Now, the values that are used by the compareTo-method of the underlying objects change overtime. Because of this, I want to 'update' the ordering of the list (by resorting it, or something similar).
However, java.util.Collections.sort(list) doesn't work, and just rebuilding the list is probably too slow (and would mess up the whole concurrency-proofness). Is there any other solution I should look at?
It does not have to lead to an optimal sort (which is near-impossible with concurrency and changing values anyway). Near optimal would suffice, as long as any remove/add-calls remain thread-proof (this would be a real issue when rebuilding the list when sorting).

Every time you edit an item such that it's sort order may potentially change, you have to remove it from the list then change the key and then re-insert it.
Dr Cliff Click at Azul Systems has a very nice presentation of how they do lock-free hash-tables using tombstones and such. If you go towards writing your own skip-list/tree to make the reordering of an item into a single - and hopefully faster - op, then you might also go this lock-free route too. And be sure to share your results :)

These types of collections in the Java API do not support mutable elements (i.e. elements where the compareTo method changes). As such, the only way to do it is re-assemble a new list in an atomic way, or as Will suggests you can perform a remove, mutate and re-insert of the element.
HashSet has the same problem - the hash bucket is calculated on insertion of an object, then you won't be able to do set.contains( ... ) if you mutate the object's hash code.
To be exact, collections like ConcurrentSkipListSet and HashSet perform their comparisons/hashing on insertion and removal. The only collections that 'support' mutable elements do not perform special insertion logic based on the state of the elements (e.g. an ArrayList).
The documentation for the Set interface states:
Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.
and the documentation for the SortedSet interface states:
Note that the ordering maintained by a sorted set (whether or not an explicit comparator is provided) must be consistent with equals if the sorted set is to correctly implement the Set interface. (See the Comparable interface or Comparator interface for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a sorted set performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the sorted set, equal. The behavior of a sorted set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.

List without duplicates?

I just read Is there a no-duplicate List implementation out there? answer about a List, NOT Set, implementation that doesn't allow duplicates. The accepted answer recommended the Collections15 SetUniqueList. Is there some equivalent -- in Guava perhaps? (I searched the docs and couldn't find) -- in other libs or is there some other currently popular solution?

I like to require the correct Java Semantic when I have a Collection that should not have any duplicates, and that is to use the Set interface. That said, in many cases you want to preserve the insertion order like a List would for convenience, maintaining two parallel data structures seems wasteful and complicated to keep in sync. That is why I use something like this. InsertionOrderSet.java This is a special implementation of SortedSet that uses a wrapper object to maintain and index that can be sorted on by a comparator, but hides that implementation detail from the external consumers so it just looks like a regular old type safe SortedSet.

The stuff in the answer you linked seems quite adequate. I'd go with one of those solutions. Alternatively, you could simply extend class ArrayList and override some of the methods that do a check against a HashSet prior to calling super.method for that same method. Simply add the Set as an instance field, use it for checking dupes and add/remove whatever items get added to/removed from the list.

When to use the various generic containers in Java

Does anyone know of any resources or books I can read to help me understand the various Java collection classes?
For example:When would one use a Collection<T> instead of a List<T>
and when would you use a Map<T, V> instead of aList<V>, where V has a member getId as shown below, allowing you to search the list for the element matching a given key:
class V {
T getId();
}
thanks

You use a Map if you want a map-like behaviour. You use a List if you want a list-like behaviour. You use a Collection if you don't care.
Generics have nothing to do with this.
See the Collections tutorial.

You can take a look at sun tutorial. It explains everything in detail.
http://java.sun.com/docs/books/tutorial/collections/index.html (Implementation section explain the difference between them)

This book is very good and covers both the collection framework and generics.

You can check the documentation of the java collection API.
Anyway a basic rule is : be as generic as possible for the type of your parameters. Be as generic as possible for the return type of your interfaces. Be as specific as possible for the return type of your final class.

A good place to start would be the Java API. Each collection has a good description associated with it. After that, you can search for any variety of articles and books on Java Collections on Google.

The decision depends on your data and your needs to use the data.
You should use a map if you have data where you can identify each element with a specific key and want to access or find it by with this key.
You take a List if you don't have a key but you're interested in the order of the elements. like a bunch of Strings you want to store in the order the user entered it.
You take a Set if you don't want to store the same element twice.
Also interesting for your decision is if you're working in am multithreaded environment. So if many threads are accessing the same list at the same tame you would rather take a Vector instead of an ArrayList.
Btw. for some collections it is usefull if your data class implements an interface like comparable or at least overrides the equals function.
here you will find more information.

Most Java books will have a good expanation of the Collections Framework. I find that Object-Oriented-Software-Development-Using has a good chapter that expains the reasons why one Collection is selected over another.
The Head first Java also has a good intropduction but may not tackle the problem of which to select.

The answer to your question is how are you going to be using the data structure? And to get a better idea of the possibilities, it is good to look at the whole collections interfaces hierarchy. For simplicity sake, I am restricting this discussion only to the classic interfaces, and am ignoring all of the concurrent interfaces.
Collection
+- List
+- Set
+- SortedSet
Map
+- SortedMap
So, we can see from the above, a Map and a Collection are different things.
A Collection is somewhat analogous to a bag, it contains a number of values, but makes no further guarantees about them. A list is simply an ordered set of values, where the order is defined externally, not implicitly from the values themselves. A Set on the other hand is a group of values, no two of which are the same, however they are not ordered, neither explicitly, nor implicitly. A SortedSet is a set of unique values that are implicitly sorted, that is, if you iterate over the values, then they will always be returned in the same order.
A Map is mapping from a Set of keys to values. A SortedMap is a mapping from a SortedSet of keys to values.
As to why you would want to use a Map instead of a List? This depends largely on how you need to lookup your data. If you need to do (effectively) random lookups using a key, then you should be using a set, since the implementations of that give you either O(1) or O(lgn) lookups. Searching through the list is O(n). If however, you are performing some kind of "batch" process, that is you are processing each, and every, item in the list then a list, or Set if you need the uniqueness constraint, is more appropriate.

The other answers already covered an overview of what the collections are, so I'd add one rule of thumb that applies to how you might use collections in your programming:
Be strict in what you send, but generous in what you receive
This is a little controversial (some engineers believe that you should always be as strict as possible) but it's a rule of thumb that, when applied to collections, teaches us to pick the collection that limits your users the least when taking arguments but gives as much information as possible when returning results.
In other words a method signature like:
LinkedList< A > doSomething(Collection< A > col);
Might be preferred to:
Collection< A > doSomething(LinkedList< A > list);
In version 1, your user doesn't have to massage their data to use your method. They can pass you an ArrayList< A >, LinkedHashSet< A > or a Collection< A > and you will deal with. On receiving the data from the method, they have a lot more information in what they can do with it (list specific iterators for example) than they would in option 2.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.