When to use Enum or Collection in Java

When to use Enum or Collection in Java - java

Under what circumstances is an enum more appropriate than, for example, a Collection that guarantees unique elements (an implementer of java.util.Set, I guess...)?
(This is kind of a follow up from my previous question)

Basically when it's a well-defined, fixed set of values which are known at compile-time.
You can use an enum as a set very easily (with EnumSet) and it allows you to define behaviour, reference the elements by name, switch on them etc.

When the elements are known up front and won't change, an enum is appropriate.
If the elements can change during runtime, use a Set.

I am no java guru, but my guess is to use enumeration when you want to gurantee a certain pool of values, and to use a collection when you want to gurantee uniqueness. Example would be to enumerate days of the week (cant have "funday") and to have a collection of SSN (generic example i know!)

Great responses - I'll try and summarise, if just for my own reference - it kinda looks like you should use enums in two situations:
All the values you need are known at compile time, and either or both of the following:
you want better performance than your usual collection implementations
you want to limit the potential values to those specified at compile time
With the Collection over enumeration links that Jon gave, you can get the benefits of enum performance and safety as an implementation detail without incorporating it into your overall design.
Community wiki'd, please do edit and improve if you want to!

Note: you can have both with an EnumSet.

In some situations your business requires the creation of new items, but at the same time business logic based on some fixed items. For the fixed ones you want an enum, the new ones obviously require some kind of collection/db.
I've seen projects using a collection for this kind of items, resulting in business logic depending on data which can be deleted by the user. Never do this, but do create a separate enum for the fixed ones and a collection for the others, just as required.
An other solution is to use a collection with immutable objects for the fixed values. These items could also reside in a db, but have an extra flag so users cannot update / delete it.

Related

Benefits of using Enums over Collections

I'm trying to check if a users account type matches one of several Strings.
There's debate in the office as to whether this should be represented as an enum with each entry containing a different string, or as a Set of Strings. Whilst the Set may be more efficient, an enum may be stylistically superior as it is clearer it is being used for logic flow.
What are the advantages of these two approaches?

Indeed, a Set<String> is more efficient in terms of performance when searching. However, I wouldn't expect that you have thousands of account types, but several, so you won't actually feel the difference when searching. There's one problem with this approach, though - you will be able to add any String to the Set, which is brittle.
My personal prefer would be to use an enum, especially if you don't expect that more account types will be introduced. And if you have a Set<AccountType> you'll be restricted with the values you can add (i.e. you will be able to add only account types, but not anything, like the approach with a Set<String>). The problem with this approach is the Open/Closed Principle - consider you have a switch statement over a AccountType variable with all the corresponding cases. Then, if you introduce a new AccountType constant, you must change the switch statement (with adding a new case), which breaks the "Open/Closed principle". In this case the neatest design would be to have an abstract class/interface, called AccountType which has all the specific account types as sub-classes.
So, there are several approaches you can follow, but before picking one, you should try answer yourselves the question of "How are we going to use it?"

An enum would be better since account types (typically) do not change dynamically. Furthermore, using an enum makes the types more precise - e.g. there's no way to mix up "Hello, World!" with an account type.

Enums are great because you get compile time checking. Invalid values simply won't compile so it 'fails fast'.
A collection of strings is great when you want to add another option without compiling/releasing a new version of your application. If, for instance, the valid options were configured in a database table.

It is worth noting that an enum's valueOf(String) method is implemented using the Enum.valueOf(Class,String) method, which in turn is implemented using a HashMap.
This basically means that looking up the account type from the string by using AccountTypes.valueOf() is an O(1) operation and quite as efficient as a set operation. You can then use the returned value (the actual enum object) in the program, with full type safety, faster comparisons, and all the other benefits of the enum.

It sounds to me like the problem is that you are using a string to represent data that can only have a few valid, known values. A Set may be helpful to validate if the string value is valid, but it doesn't prevent it from becoming invalid.
My suggestion is to define an enum with the valid account types and use that in place of strings. If you have input coming from the outside that represents an account type, then put a static method on the enum like "fromString" which returns an appropriate enum instance, thereby shortening the window of where invalid data be be a consideration.
You can create a Set of AccountType enum instances, provided you implement the appropriate Comparator, compareTo, or hashCode methods (depending on if you used TreeSet or HashSet, etc.). This could be useful if you have classifications of account types that you need to check against. For example, if there are "Local Admins", "Global Admins", and "Security Admins", you could define a method isAdmin(AccountType t) which searches a Set of AccountTypes. For example:
private Set<AccountType> ADMIN_ACCOUNT_TYPES = new HashSet<AccountType>() {{
add(AccountType.LOCAL_ADMIN);
add(AccountType.GLOBAL_ADMIN);
add(AccountType.SECURITY_ADMIN);
}};
public boolean isAdmin(AccountType t) {
return ADMIN_ACCOUNT_TYPES.contains(t);
}
Now, if you have a case where there are lots of different account types, with many groupings, and performance of lookups is a concern, this is how you could solve it.
Though to be honest, if you only have a few account types and they rarely change, this may be over-engineering it a bit. If there are only 2 account types, a simple if statement with equality check will be more efficient than a hash table lookup.
Again, performance may not really be a problem here. Don't over-optimize or optimize prematurely.

In my experience, I suggest using enum in this case. Even mysql supports enum for use cases where you want a column to accept values from an explicitly declared list.

I'd use a Map<String,Enum> = new HashMap<>(); for maximum efficiently.

Whether or not to code to an interface when only certain implementations provide correct behavior

So, I know that coding to an interface (using an interface as a variable's declared type instead of its concrete type) is a good practice in OO code, for a bunch of reasons. This is seen a lot, for example, with Java collections. Well, is referring to an interface in your program still a good thing to do when only certain implementations of that interface provide correct behavior?
For example, I have a Java program. In that program, I have multiple sets of objects. I chose to use a Set, because I didn't want duplicate elements. However, I wanted a list's ordering property (i.e. maintain insertion order). Therefore, I am using a LinkedHashSet as the concrete Set type. One thing these sets are used for is computing a dot product involving the primitive fields of the objects contained in the sets, such as in (simplifying a bit):
double dot(LinkedHashSet<E> set, double[] array) {
double sum = 0.0;
int i = 0;
for(E element : set) {
sum += (element.getValue()*array[i]);
}
return sum;
}
This method's result is dependent on the set's iteration order, and so certain Set implementations, mainly HashSet, will give incorrect/unexpected results. Currently, I am using LinkedHashSet throughout my program as the declared type, instead of Set, to ensure correct behavior. However, that feels bad stylistically. What's the right thing to do here? Is it okay to use the concrete type in this case? Or maybe should I use Set as the type, but then state in the documentation which implementations will/won't produce correct behavior? I'm looking more for general input than anything specific to the scenario above. In particular, this should apply to really any scenario where you're using the ordering properties of a LinkedHashSet or TreeSet. How do you prevent unintended implementations from being used? Do you force it in the code (by ditching the interface), or do you specify it in the documentation? Or perhaps some other approach?

It is true that you should code to interfaces, but only if the assurances they make fit your needs. In your case, if you would only use Set then you are saying: I don't want duplicates, but I don't care about the order. You could also use a List and mean: I care about insertion order, but not about duplicates. There even is a SortedSet but it does not have the ordering you want. So in your case you can't replace LinkedHashSet by one of its interfaces without violating the Liskov substitution principle.
So I would argue that in your case you should stick to the implementation until you really need the to switch to another implementation. With modern IDEs refactoring is not that hard anymore so I would refrain from doing any premature optimizations -- YAGNI and KISS.

Very very great question. One solution is: Make another interface! Say one that extends SortedMap but has a getInsertionOrderIterator() method or an interface that extends Map & has getOrderIterator() & getInsertionOrderIterator() methods.
You can write a quick adapter class that contains a LinkedHashMap & TreeMap as the backend data structures.

You can make arguments for either way. As long as you and others maintaining this code know that particular implementations of Set might break the rest of the app or library, then coding to the interface is fine. However, if that is not true, then you should use the specific implementation.
The purpose of coding to an interface is to give you flexibility that will not break your app. Take JDBC for instance. If you use the wrong driver it will break your program similar to how you are describing here. However, if let's say Oracle decided to put behavior in their JDBC driver that subtly broke code written to the JDBC spec instead of the specific Oracle driver code then you'd have to choose.
There is no cut and dry, "this is always right" type of answer.

No Java implementations for arrayset

How come Java provides several different implementations of the Set type, including HashSet and TreeSet and not ArraySet?

A set based solely on an array of elements in no particular order would always have O(n) time for a containment check. It wouldn't be terribly useful, IMO. When would you want to use that instead of HashSet or TreeSet?
The most useful aspect of an array is that you can get to an element with a particular index extremely quickly. That's not terribly relevant when it comes to sets.

There is CopyOnWriteArraySet which is a set backed by an array.
This is not particularly useful as its performance is not great for large collections.

Android has android.util.ArraySet (introduced in API level 23) and android.util.ArrayMap (introduced in API level 19).

Actually the concrete implementation of Set does not make any sense. Any set stores elements and guaranties their uniqueness.
I cannot be sure but it sounds that you want Set implementation that preserves order of elements. If I am right use LinkedHashSet.

Java provides multiple implementations of its Collection Interfaces that allow for best performance. ArrayList performs good on many List operations.
For Set Operations, which allways require uniquness different implementations offer better performance. If implemented using an array, any modification operation would have to run through all the array elements to check if it is allready in the Set. HashSet and TreeSet simplyfy this check greatly.

The Set interface has no get-by-index method, such as List.get(int), so there's no use suggesting Set can have array like properties.
Ultimately, all "grouping" classes use arrays under the hood to store their elements, but that doesn't mean you have to expose methods for accessing the array.

You can always implement it yourself....now granted there probably is only one extremely, extremely limited case where it would be useful(and in that case you could use better data structures anyway) and that is where you have a very large set that almost never changes then an array set would take up SLIGHTLY less memory(no extra pointers) and you would have ever so slightly faster enumeration of the whole set... If you keep the array sorted then you can still get O(lg n) search time.
However those differences are purely academic. In the real world you would never really want such a beast

Consider indexed-tree-map , you will be able to access elements by index and get index of elements while keeping the sort order. Duplicates can be put into arrays as values under the same key.

What would be a good way to implement a set collection with weak references, compares by reference, and is also sortable in Java?

I want to have an object that allows other objects of a specific type to register themselves with it. Ideally it would store the references to them in some sort of set collection and have .equals() compare by reference rather than value. It shouldn't have to maintain a sort at all times, but it should be able to be sorted before the collection is iterated over.
Looking through the Java Collection Library, I've seen the various features I'm looking for on different collection types, but I am not sure about how I should go about using them to build the kind of collection I'm looking for.
This is Java in the context of Android if that is significant.

Java's built-in tree-based collections won't work.
To illustrate, consider a tree containing weak references to nodes 'B', 'C', and 'D':
C
B D
Now let the weak reference 'C' get collected, leaving null behind:
-
B D
Now insert an element into the tree. The TreeMap/TreeSet doesn't have sufficient information to select the left or right subtree. If your comparator says null is a small value, then it will be incorrect when inserting 'A'. If it says null is a large value, it will be incorrect when inserting 'E'.
Sort on demand is a good choice.
A more robust solution is to use an ArrayList<WeakReference<T>> and to implement a Comparator<WeakReference<T>> that delegates to a Comparator<T>. Then call Collections.sort() prior to iteration.
Android's Collections.sort uses TimSort behind-the-scenes and so it runs quite efficiently if the input is already partially sorted.

Perhaps the collections classes are a level of abstraction below what you're looking for? It sounds like the end product you want is a cache with the ability to iterate in a user-defined sort order. If so, perhaps the cache interface in the Google Guava library is close enough to what you want:
http://code.google.com/p/guava-libraries/source/browse/trunk/guava/src/com/google/common/cache/Cache.java
At a glance, it looks like CacheBuilder in that package doesn't allow you to build an implementation with user-defined iteration order. However, it does provide a Map view that might be good enough for your needs:
List<Thing> cachedThings = Lists.newArrayList(cache.asMap().values());
Collections.sort(cachedThings, YOUR_THING_COMPARATOR);
for (Thing thing : cachedThings) { ... }
Even if this isn't exactly what you want, the classes in that package might give you some useful insights re: using References with Collections.

DISCLAIMER: This was a comment but it got kinda big, sorry if it doesn't solve your problem:
References in Java
Just to clarify what I mean when I say reference, since it isn't really a term commonly used in Java: Java does not really use references or pointers. It uses a kind of pseudo-reference that can be (and is by default) assigned to the special null instance. That's one way to explain it anyway. In Java, these pseudo-references are the only way that an Object can be handled. When I say reference, I mean these pseudo-references.
Sets
Any Set implementation will not allow two references to the same object to be included in it since it uses identity equality for this check. That violates the mathematical concept of a set. The Java Sets ignore any attempt to add duplicate references.
You mention a Map in your comment though... Could you clarify what kind of collection you are after? And why you need that kind of equality checking within it? Are you thinking in C++ terms? I'll try to edit my answer to be more helpful then :)
EDIT: I thought that might have been your goal ;) So a TreeSet should do the trick then! I would not get concerned about performance until there is a performance issue. Simplicity is fantastic for readability, maintenance and preventing bugs. If performance does become a problem, ideally you should profile your code and only optimize the areas that are proven to be the problem.

When to use the various generic containers in Java

Does anyone know of any resources or books I can read to help me understand the various Java collection classes?
For example:When would one use a Collection<T> instead of a List<T>
and when would you use a Map<T, V> instead of aList<V>, where V has a member getId as shown below, allowing you to search the list for the element matching a given key:
class V {
T getId();
}
thanks

You use a Map if you want a map-like behaviour. You use a List if you want a list-like behaviour. You use a Collection if you don't care.
Generics have nothing to do with this.
See the Collections tutorial.

You can take a look at sun tutorial. It explains everything in detail.
http://java.sun.com/docs/books/tutorial/collections/index.html (Implementation section explain the difference between them)

This book is very good and covers both the collection framework and generics.

You can check the documentation of the java collection API.
Anyway a basic rule is : be as generic as possible for the type of your parameters. Be as generic as possible for the return type of your interfaces. Be as specific as possible for the return type of your final class.

A good place to start would be the Java API. Each collection has a good description associated with it. After that, you can search for any variety of articles and books on Java Collections on Google.

The decision depends on your data and your needs to use the data.
You should use a map if you have data where you can identify each element with a specific key and want to access or find it by with this key.
You take a List if you don't have a key but you're interested in the order of the elements. like a bunch of Strings you want to store in the order the user entered it.
You take a Set if you don't want to store the same element twice.
Also interesting for your decision is if you're working in am multithreaded environment. So if many threads are accessing the same list at the same tame you would rather take a Vector instead of an ArrayList.
Btw. for some collections it is usefull if your data class implements an interface like comparable or at least overrides the equals function.
here you will find more information.

Most Java books will have a good expanation of the Collections Framework. I find that Object-Oriented-Software-Development-Using has a good chapter that expains the reasons why one Collection is selected over another.
The Head first Java also has a good intropduction but may not tackle the problem of which to select.

The answer to your question is how are you going to be using the data structure? And to get a better idea of the possibilities, it is good to look at the whole collections interfaces hierarchy. For simplicity sake, I am restricting this discussion only to the classic interfaces, and am ignoring all of the concurrent interfaces.
Collection
+- List
+- Set
+- SortedSet
Map
+- SortedMap
So, we can see from the above, a Map and a Collection are different things.
A Collection is somewhat analogous to a bag, it contains a number of values, but makes no further guarantees about them. A list is simply an ordered set of values, where the order is defined externally, not implicitly from the values themselves. A Set on the other hand is a group of values, no two of which are the same, however they are not ordered, neither explicitly, nor implicitly. A SortedSet is a set of unique values that are implicitly sorted, that is, if you iterate over the values, then they will always be returned in the same order.
A Map is mapping from a Set of keys to values. A SortedMap is a mapping from a SortedSet of keys to values.
As to why you would want to use a Map instead of a List? This depends largely on how you need to lookup your data. If you need to do (effectively) random lookups using a key, then you should be using a set, since the implementations of that give you either O(1) or O(lgn) lookups. Searching through the list is O(n). If however, you are performing some kind of "batch" process, that is you are processing each, and every, item in the list then a list, or Set if you need the uniqueness constraint, is more appropriate.

The other answers already covered an overview of what the collections are, so I'd add one rule of thumb that applies to how you might use collections in your programming:
Be strict in what you send, but generous in what you receive
This is a little controversial (some engineers believe that you should always be as strict as possible) but it's a rule of thumb that, when applied to collections, teaches us to pick the collection that limits your users the least when taking arguments but gives as much information as possible when returning results.
In other words a method signature like:
LinkedList< A > doSomething(Collection< A > col);
Might be preferred to:
Collection< A > doSomething(LinkedList< A > list);
In version 1, your user doesn't have to massage their data to use your method. They can pass you an ArrayList< A >, LinkedHashSet< A > or a Collection< A > and you will deal with. On receiving the data from the method, they have a lot more information in what they can do with it (list specific iterators for example) than they would in option 2.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.