I am looking for a way to determine if a Collection (or maybe even any Iterable) is guaranteed to be ordered by its class contract.
I already know the Guava method : Ordering.natural().isOrdered(myCollection)
But this method is not relevant to my needs, because it checks if the values inside the collection are ordered. That's not what I need to determine, what I want to have is a isSorted method that will behave like this :
isSorted(new HashSet()) -> false
isSorted(new ArrayList()) -> true
etc...
What I am looking at would be typically implemented by checking the class of the collection, and comparing it to some kind of reference table of the collections which contract states that they are ordered, and only return true for these ones.
Do you know if something like this already exists in some library ?
You can do the following to determine if a collection is defined to be sorted.
collection instanceof SortedSet
There are three interfaces for ordered collections: List, SortedSet and SortedMap. You can check if your class is implementing one of them.
No, this doesn't exist in any library, and for good reason.
That library would have to know all the collection types that are flying around. If you're using Apache Commons Collections, it'd have to know about all of those. If you're using Guava, it'd have to know about all of those. If someone comes along and introduces a new collection type, you're now going to reject that type, even if it's ordered.
It doesn't make sense to provide that method in a library that can't know what other libraries you might have with whatever other collection types might be out there.
In an end application, it might make sense to implement it, with the heuristic techniques you've already been describing.
It might help if we knew what you were actually trying to do with this method.
Related
So, I know that coding to an interface (using an interface as a variable's declared type instead of its concrete type) is a good practice in OO code, for a bunch of reasons. This is seen a lot, for example, with Java collections. Well, is referring to an interface in your program still a good thing to do when only certain implementations of that interface provide correct behavior?
For example, I have a Java program. In that program, I have multiple sets of objects. I chose to use a Set, because I didn't want duplicate elements. However, I wanted a list's ordering property (i.e. maintain insertion order). Therefore, I am using a LinkedHashSet as the concrete Set type. One thing these sets are used for is computing a dot product involving the primitive fields of the objects contained in the sets, such as in (simplifying a bit):
double dot(LinkedHashSet<E> set, double[] array) {
double sum = 0.0;
int i = 0;
for(E element : set) {
sum += (element.getValue()*array[i]);
}
return sum;
}
This method's result is dependent on the set's iteration order, and so certain Set implementations, mainly HashSet, will give incorrect/unexpected results. Currently, I am using LinkedHashSet throughout my program as the declared type, instead of Set, to ensure correct behavior. However, that feels bad stylistically. What's the right thing to do here? Is it okay to use the concrete type in this case? Or maybe should I use Set as the type, but then state in the documentation which implementations will/won't produce correct behavior? I'm looking more for general input than anything specific to the scenario above. In particular, this should apply to really any scenario where you're using the ordering properties of a LinkedHashSet or TreeSet. How do you prevent unintended implementations from being used? Do you force it in the code (by ditching the interface), or do you specify it in the documentation? Or perhaps some other approach?
It is true that you should code to interfaces, but only if the assurances they make fit your needs. In your case, if you would only use Set then you are saying: I don't want duplicates, but I don't care about the order. You could also use a List and mean: I care about insertion order, but not about duplicates. There even is a SortedSet but it does not have the ordering you want. So in your case you can't replace LinkedHashSet by one of its interfaces without violating the Liskov substitution principle.
So I would argue that in your case you should stick to the implementation until you really need the to switch to another implementation. With modern IDEs refactoring is not that hard anymore so I would refrain from doing any premature optimizations -- YAGNI and KISS.
Very very great question. One solution is: Make another interface! Say one that extends SortedMap but has a getInsertionOrderIterator() method or an interface that extends Map & has getOrderIterator() & getInsertionOrderIterator() methods.
You can write a quick adapter class that contains a LinkedHashMap & TreeMap as the backend data structures.
You can make arguments for either way. As long as you and others maintaining this code know that particular implementations of Set might break the rest of the app or library, then coding to the interface is fine. However, if that is not true, then you should use the specific implementation.
The purpose of coding to an interface is to give you flexibility that will not break your app. Take JDBC for instance. If you use the wrong driver it will break your program similar to how you are describing here. However, if let's say Oracle decided to put behavior in their JDBC driver that subtly broke code written to the JDBC spec instead of the specific Oracle driver code then you'd have to choose.
There is no cut and dry, "this is always right" type of answer.
Collection list = new LinkedList(); // Good?
LinkedList list = new LinkedList(); // Bad?
First variant gives more flexibility, but is that all? Are there any other reasons to prefer it? What about performance?
These are design decisions, and one size usually doesn't fit all. Also the choice of what is used internally for the member variable can (and usually should be) different from what is exposed to the outside world.
At its heart, Java's collections framework does not provide a complete set of interfaces that describe the performance characteristics without exposing the implementation details. The one interface that describes performance, RandomAccess is a marker interface, and doesn't even extend Collection or re-expose the get(index) API. So I don't think there is a good answer.
As a rule of thumb, I keep the type as unspecific as possible until I recognize (and document) some characteristic that is important. For example, as soon as I want methods to know that insertion order is retained, I would change from Collection to List, and document why that restriction is important. Similarly, move from List to LinkedList if say efficient removal from front becomes important.
When it comes to exposing the collection in public APIs, I always try to start exposing just the few APIs that are expected to get used; for example add(...) and iterator().
Collection list = new LinkedList(); //bad
This is bad because, you don't want this reference to refer say an HashSet(as HashSet also implements Collection and so does many other class's in the collection framework).
LinkedList list = new LinkedList(); //bad?
This is bad because, good practice is to always code to the interface.
List list = new LinkedList();//good
This is good because point 2 days so.(Always Program To an Interface)
Use the most specific type information on non-public objects. They are implementation details, and we want our implementation details as specific and precise as possible.
Sure. If for example java will find and implement more efficient implementation for the List collection, but you already have API that accepts only LinkedList, you won't be able to replace the implementation if you already have clients for this API. If you use interface, you can easily replace the implementation without breaking the APIs.
They're absolutely equivalent. The only reason to use one over the other is that if you later want to use a function of list that only exists in the class LinkedList, you need to use the second.
My general rule is to only be as specific as you need to be at the time (or will need to be in the near future, within reason). Granted, this is somewhat subjective.
In your example I would usually declare it as a List just because the methods available on Collection aren't very powerful, and the distinction between a List and another Collection (Map, Set, etc.) is often logically significant.
Also, in Java 1.5+ don't use raw types -- if you don't know the type that your list will contain, at least use List<?>.
I just read Is there a no-duplicate List implementation out there? answer about a List, NOT Set, implementation that doesn't allow duplicates. The accepted answer recommended the Collections15 SetUniqueList. Is there some equivalent -- in Guava perhaps? (I searched the docs and couldn't find) -- in other libs or is there some other currently popular solution?
I like to require the correct Java Semantic when I have a Collection that should not have any duplicates, and that is to use the Set interface. That said, in many cases you want to preserve the insertion order like a List would for convenience, maintaining two parallel data structures seems wasteful and complicated to keep in sync. That is why I use something like this. InsertionOrderSet.java This is a special implementation of SortedSet that uses a wrapper object to maintain and index that can be sorted on by a comparator, but hides that implementation detail from the external consumers so it just looks like a regular old type safe SortedSet.
The stuff in the answer you linked seems quite adequate. I'd go with one of those solutions. Alternatively, you could simply extend class ArrayList and override some of the methods that do a check against a HashSet prior to calling super.method for that same method. Simply add the Set as an instance field, use it for checking dupes and add/remove whatever items get added to/removed from the list.
I want to have an object that allows other objects of a specific type to register themselves with it. Ideally it would store the references to them in some sort of set collection and have .equals() compare by reference rather than value. It shouldn't have to maintain a sort at all times, but it should be able to be sorted before the collection is iterated over.
Looking through the Java Collection Library, I've seen the various features I'm looking for on different collection types, but I am not sure about how I should go about using them to build the kind of collection I'm looking for.
This is Java in the context of Android if that is significant.
Java's built-in tree-based collections won't work.
To illustrate, consider a tree containing weak references to nodes 'B', 'C', and 'D':
C
B D
Now let the weak reference 'C' get collected, leaving null behind:
-
B D
Now insert an element into the tree. The TreeMap/TreeSet doesn't have sufficient information to select the left or right subtree. If your comparator says null is a small value, then it will be incorrect when inserting 'A'. If it says null is a large value, it will be incorrect when inserting 'E'.
Sort on demand is a good choice.
A more robust solution is to use an ArrayList<WeakReference<T>> and to implement a Comparator<WeakReference<T>> that delegates to a Comparator<T>. Then call Collections.sort() prior to iteration.
Android's Collections.sort uses TimSort behind-the-scenes and so it runs quite efficiently if the input is already partially sorted.
Perhaps the collections classes are a level of abstraction below what you're looking for? It sounds like the end product you want is a cache with the ability to iterate in a user-defined sort order. If so, perhaps the cache interface in the Google Guava library is close enough to what you want:
http://code.google.com/p/guava-libraries/source/browse/trunk/guava/src/com/google/common/cache/Cache.java
At a glance, it looks like CacheBuilder in that package doesn't allow you to build an implementation with user-defined iteration order. However, it does provide a Map view that might be good enough for your needs:
List<Thing> cachedThings = Lists.newArrayList(cache.asMap().values());
Collections.sort(cachedThings, YOUR_THING_COMPARATOR);
for (Thing thing : cachedThings) { ... }
Even if this isn't exactly what you want, the classes in that package might give you some useful insights re: using References with Collections.
DISCLAIMER: This was a comment but it got kinda big, sorry if it doesn't solve your problem:
References in Java
Just to clarify what I mean when I say reference, since it isn't really a term commonly used in Java: Java does not really use references or pointers. It uses a kind of pseudo-reference that can be (and is by default) assigned to the special null instance. That's one way to explain it anyway. In Java, these pseudo-references are the only way that an Object can be handled. When I say reference, I mean these pseudo-references.
Sets
Any Set implementation will not allow two references to the same object to be included in it since it uses identity equality for this check. That violates the mathematical concept of a set. The Java Sets ignore any attempt to add duplicate references.
You mention a Map in your comment though... Could you clarify what kind of collection you are after? And why you need that kind of equality checking within it? Are you thinking in C++ terms? I'll try to edit my answer to be more helpful then :)
EDIT: I thought that might have been your goal ;) So a TreeSet should do the trick then! I would not get concerned about performance until there is a performance issue. Simplicity is fantastic for readability, maintenance and preventing bugs. If performance does become a problem, ideally you should profile your code and only optimize the areas that are proven to be the problem.
Does anyone know of any resources or books I can read to help me understand the various Java collection classes?
For example:When would one use a Collection<T> instead of a List<T>
and when would you use a Map<T, V> instead of aList<V>, where V has a member getId as shown below, allowing you to search the list for the element matching a given key:
class V {
T getId();
}
thanks
You use a Map if you want a map-like behaviour. You use a List if you want a list-like behaviour. You use a Collection if you don't care.
Generics have nothing to do with this.
See the Collections tutorial.
You can take a look at sun tutorial. It explains everything in detail.
http://java.sun.com/docs/books/tutorial/collections/index.html (Implementation section explain the difference between them)
This book is very good and covers both the collection framework and generics.
You can check the documentation of the java collection API.
Anyway a basic rule is : be as generic as possible for the type of your parameters. Be as generic as possible for the return type of your interfaces. Be as specific as possible for the return type of your final class.
A good place to start would be the Java API. Each collection has a good description associated with it. After that, you can search for any variety of articles and books on Java Collections on Google.
The decision depends on your data and your needs to use the data.
You should use a map if you have data where you can identify each element with a specific key and want to access or find it by with this key.
You take a List if you don't have a key but you're interested in the order of the elements. like a bunch of Strings you want to store in the order the user entered it.
You take a Set if you don't want to store the same element twice.
Also interesting for your decision is if you're working in am multithreaded environment. So if many threads are accessing the same list at the same tame you would rather take a Vector instead of an ArrayList.
Btw. for some collections it is usefull if your data class implements an interface like comparable or at least overrides the equals function.
here you will find more information.
Most Java books will have a good expanation of the Collections Framework. I find that Object-Oriented-Software-Development-Using has a good chapter that expains the reasons why one Collection is selected over another.
The Head first Java also has a good intropduction but may not tackle the problem of which to select.
The answer to your question is how are you going to be using the data structure? And to get a better idea of the possibilities, it is good to look at the whole collections interfaces hierarchy. For simplicity sake, I am restricting this discussion only to the classic interfaces, and am ignoring all of the concurrent interfaces.
Collection
+- List
+- Set
+- SortedSet
Map
+- SortedMap
So, we can see from the above, a Map and a Collection are different things.
A Collection is somewhat analogous to a bag, it contains a number of values, but makes no further guarantees about them. A list is simply an ordered set of values, where the order is defined externally, not implicitly from the values themselves. A Set on the other hand is a group of values, no two of which are the same, however they are not ordered, neither explicitly, nor implicitly. A SortedSet is a set of unique values that are implicitly sorted, that is, if you iterate over the values, then they will always be returned in the same order.
A Map is mapping from a Set of keys to values. A SortedMap is a mapping from a SortedSet of keys to values.
As to why you would want to use a Map instead of a List? This depends largely on how you need to lookup your data. If you need to do (effectively) random lookups using a key, then you should be using a set, since the implementations of that give you either O(1) or O(lgn) lookups. Searching through the list is O(n). If however, you are performing some kind of "batch" process, that is you are processing each, and every, item in the list then a list, or Set if you need the uniqueness constraint, is more appropriate.
The other answers already covered an overview of what the collections are, so I'd add one rule of thumb that applies to how you might use collections in your programming:
Be strict in what you send, but generous in what you receive
This is a little controversial (some engineers believe that you should always be as strict as possible) but it's a rule of thumb that, when applied to collections, teaches us to pick the collection that limits your users the least when taking arguments but gives as much information as possible when returning results.
In other words a method signature like:
LinkedList< A > doSomething(Collection< A > col);
Might be preferred to:
Collection< A > doSomething(LinkedList< A > list);
In version 1, your user doesn't have to massage their data to use your method. They can pass you an ArrayList< A >, LinkedHashSet< A > or a Collection< A > and you will deal with. On receiving the data from the method, they have a lot more information in what they can do with it (list specific iterators for example) than they would in option 2.