Is a Collection better than a LinkedList?

Is a Collection better than a LinkedList? - java

Collection list = new LinkedList(); // Good?
LinkedList list = new LinkedList(); // Bad?
First variant gives more flexibility, but is that all? Are there any other reasons to prefer it? What about performance?

These are design decisions, and one size usually doesn't fit all. Also the choice of what is used internally for the member variable can (and usually should be) different from what is exposed to the outside world.
At its heart, Java's collections framework does not provide a complete set of interfaces that describe the performance characteristics without exposing the implementation details. The one interface that describes performance, RandomAccess is a marker interface, and doesn't even extend Collection or re-expose the get(index) API. So I don't think there is a good answer.
As a rule of thumb, I keep the type as unspecific as possible until I recognize (and document) some characteristic that is important. For example, as soon as I want methods to know that insertion order is retained, I would change from Collection to List, and document why that restriction is important. Similarly, move from List to LinkedList if say efficient removal from front becomes important.
When it comes to exposing the collection in public APIs, I always try to start exposing just the few APIs that are expected to get used; for example add(...) and iterator().

Collection list = new LinkedList(); //bad
This is bad because, you don't want this reference to refer say an HashSet(as HashSet also implements Collection and so does many other class's in the collection framework).
LinkedList list = new LinkedList(); //bad?
This is bad because, good practice is to always code to the interface.
List list = new LinkedList();//good
This is good because point 2 days so.(Always Program To an Interface)

Use the most specific type information on non-public objects. They are implementation details, and we want our implementation details as specific and precise as possible.

Sure. If for example java will find and implement more efficient implementation for the List collection, but you already have API that accepts only LinkedList, you won't be able to replace the implementation if you already have clients for this API. If you use interface, you can easily replace the implementation without breaking the APIs.

They're absolutely equivalent. The only reason to use one over the other is that if you later want to use a function of list that only exists in the class LinkedList, you need to use the second.

My general rule is to only be as specific as you need to be at the time (or will need to be in the near future, within reason). Granted, this is somewhat subjective.
In your example I would usually declare it as a List just because the methods available on Collection aren't very powerful, and the distinction between a List and another Collection (Map, Set, etc.) is often logically significant.
Also, in Java 1.5+ don't use raw types -- if you don't know the type that your list will contain, at least use List<?>.

Related

Why does List.of() in Java not return a typed immutable list?

The list returned by the method List.of(E... elements) in java does return an immutable list, but this is not visible at all by looking at the created list. The created list simply throws an Exception instead of not showing the possiblity to change the list at all.
My point is, that List.of(E... elements) should return a ImmutableList that extends List. This way the user can decide if he cares to show this fact of immutability or not.
But I don't find anybody complaining or showing alternative solutions. Even Guava and Apache Commons don't do this by default. Only Guava gives the possibilty to create it (albeit with a lot of code):
List<String> list = new ArrayList<String>(Arrays.asList("one", "two", "three"));
ImmutableList<String> unmodifiableList = ImmutableList.<String>builder().addAll(list).build();
But even this class has a (deprecated) add and remove method.
Can anyone tell me why nobody cares about this (seemingly fundamental) issue?

It's not that nobody cares; it's that this is a problem of considerable subtlety.
The original reason there isn't a family of "immutable" collection interfaces is because of a concern about interface proliferation. There could potentially be interfaces not only for immutability, but synchronized and runtime type-checked collections, and also collections that can have elements set but not added or removed (e.g., Arrays.asList) or collections from which elements can be removed but not added (e.g., Map.keySet).
But it could also be argued that immutability is so important that it should be special-cased, and that there be support in the type hierarchy for it even if there isn't support for all those other characteristics. Fair enough.
The initial suggestion is to have an ImmutableList interface extend List, as
ImmutableList <: List <: Collection
(Where <: means "is a subtype of".)
This can certainly be done, but then ImmutableList would inherit all of the methods from List, including all the mutator methods. Something would have to be done with them; a sub-interface can't "disinherit" methods from a super-interface. The best that could be done is to specify that these methods throw an exception, provide default implementations that do so, and perhaps mark the methods as deprecated so that programmers get a warning at compile time.
This works, but it doesn't help much. An implementation of such an interface cannot be guaranteed to be immutable at all. A malicious or buggy implementation could override the mutator methods, or it could simply add more methods that mutate the state. Any programs that used ImmutableList couldn't make any assumptions that the list was, in fact, immutable.
A variation on this is to make ImmutableList be a class instead of an interface, to define its mutator methods to throw exceptions, to make them final, and to provide no public constructors, in order to restrict implementations. In fact, this is exactly what Guava's ImmutableList has done. If you trust the Guava developers (I think they're pretty reputable) then if you have a Guava ImmutableList instance, you're assured that it is in fact immutable. For example, you could store it in a field with the knowledge that it won't change out from under you unexpectedly. But this also means that you can't add another ImmutableList implementation, at least not without modifying Guava.
A problem that isn't solved by this approach is the "scrubbing" of immutability by upcasting. A lot of existing APIs define methods with parameters of type Collection or Iterable. If you were to pass an ImmutableList to such a method, it would lose the type information indicating that the list is immutable. To benefit from this, you'd have to add immutable-flavored overloads everywhere. Or, you could add instanceof checks everywhere. Both are pretty messy.
(Note that the JDK's List.copyOf sidesteps this problem. Even though there are no immutable types, it checks the implementation before making a copy, and avoids making copies unnecessarily. Thus, callers can use List.copyOf to make defensive copies with impunity.)
As an alternative, one might argue that we don't want ImmutableList to be a sub-interface of List, we want it to be a super-interface:
List <: ImmutableList
This way, instead of ImmutableList having to specify that all those mutator methods throw exceptions, they wouldn't be present in the interface at all. This is nice, except that this model is completely wrong. Since ArrayList is a List, that means ArrayList is also an ImmutableList, which is clearly nonsensical. The problem is that "immutable" implies a restriction on subtypes, which can't be done in an inheritance hierarchy. Instead, it would need to be renamed to allow capabilities to be added as one goes down the hierarchy, for example,
List <: ReadableList
which is more accurate. However, ReadableList is altogether a different thing from an ImmutableList.
Finally, there are a bunch of semantic issues that we haven't considered. One concerns immutability vs. unmodifiability. Java has APIs that support unmodifiability, for example:
List<String> alist = new ArrayList<>(...);
??? ulist = Collections.unmodifiableList(alist);
What should the type of ulist be? It's not immutable, since it will change if somebody changes the backing list alist. Now consider:
???<String[]> arlist = List.of(new String[] { ... }, new String[] { ... });
What should the type be? It's certainly not immutable, as it contains arrays, and arrays are always mutable. Thus it's not at all clear that it would be reasonable to say that List.of returns something immutable.

Removing add, remove, etc. from all the Collection types and creating subinterfaces MutableCollection, MutableList, MutableSet would double the number of Collection interfaces, which is a complexity cost to be considered. Furthermore, Collections aren't cleanly separated into Mutable and Immutable: Arrays.asList supports set, but not add.
Ultimately there's a tradeoff to be made about how much to capture in the type system and how much to enforce at runtime. Reasonable people can disagree as to where to draw the line.

I would say that since commonly collections tend to (or at least should) be treated as "immutable by default" (meaning you're rarely modifying collections that you didn't create), it's not very important to specify that "this is immutable". It would be more useful to specify "you can safely modify this collection if you wish".
Secondly, your suggested approach wouldn't work. You can't extend List and hide methods, so the only option would be to make it return an ImmutableList that's not a subtype of List. That would make it useless, as it would require a new ImmutableList interface, and any existing code wouldn't be able to use it.
So is this optimal design? No, not really, but for backwards compatibility that's not going to change.

Avoiding semantic coupling with Java Collection interfaces

These days I am reading the Code Complete book and I've passed the part about coupling-levels (simple-data-parameter, simple-object, object-parameter and semantic coupling). Semantic being the "worst" kind:
The most insidious kind of coupling occurs when one module
makes use not of some syntactic element ofanother module but of some semantic
knowledge of another module’s inner workings.
The examples in the book usually lead to run-time failures, and are typical bad code, but today I had a situation that I'm really not sure how to treat.
First I had a class, let's call it Provider fetching some data and returning a List of Foo's.
class Provider
{
public List<Foo> getFoos()
{
ArrayList<Foo> foos = createFoos();//some processing here where I create the objects
return foos;
}
}
A consuming class executes an algorithm, processing the Foo's, merging or removing from the List based on some attributes, not really important. The algorithm does all of it's processing with the head of the list. So there is a lot of operations reading/removing/adding to the head.
(I just realized I could have made the algorithm looking like merge sort, recursively calling it on halves of an array, but that doesn't answer my question :) )
I noticed I'm returning an ArrayList so I changed the providing class' getFoos method to return an LinkedList. The ArrayList has O(n) complexity with head removals, while LinkedList has constant complexity. But it then struck me that I am possibly making a semantic dependency. The code will certainly work with both implementations of List, there are no side-effects, but the performance will also be degraded. And I wrote both classes so I can easily understand, but what if a colleague had to do implement the algorithm, or if he uses the Provider as a source for another algorithm which favors random access. If he doesn't bother with the internals, like he should not, I would mess up his algorithm performance.
Declaring the method to return LinkedList is also possible, but what about the "program to interface" principle?
Is there any way to handle this situation, or my design has flaws in the roots?

The general problem is, how does a producer return something in the form that the consumer prefers? Usually the consumer needs to include the preference in the request. For example
as a flag - getFoos(randomOrLinked)
different methods - getFoosAsArrayList(), getFoosAsLinkedList()
pass a function that creates desired List - getFoos(ArrayList::new)
or pass a desired output List - getFoos(new ArrayList())
But, the producer may have the right to say, this is too complicated for me, I don't care. I'll return a form that's suitable for most use cases, and the consumer needs to handle it properly. If I think ArrayList is best, I'll just do it. (Actually you may have a better choice - a ring structure - that suits both of the two use cases in consideration)
Of course, it should be well documented. Or you could be honest and return ArrayList as the method signature, as long as you commit to it. Don't worry too much about "interface" - ArrayList is an interface (in the general sense), Iterable is an interface, so what's so special about the List interface that's between the two.
There can be another criticism on your design - you return a mutable data structure so that the consumer can directly modify. That is less desirable than return a read-only data structure. If you could, you should return a read-only view of the underlying data; the construction of the view should be inexpensive. The consumer needs to do its own copy if it needs a mutable one.

You need to make a compromise somewhere. In this case, there are two compromises that make sense to me:
If you don't know that all consumers of your Provider will be performing operations that are appropriate to a LinkedList, then stick with the signature that has return type List with implementation that returns ArrayList (ArrayList is a good all-around List implementation). Then within the calling method, wrap the returned List in a LinkedList: LinkedList<Foo> fooList = new LinkedList<>(provider.getFoos()); Make the caller responsible for its own optimizations.
If you know that all consumers of your Provider are going to use it in a LinkedList-appropriate way, then just change the return type to LinkedList -- or add another method that returns a LinkedList.
I strongly prefer the former, but both make sense.

You could have the caller create the LinkedList, instead of the provider - i.e. change
List<Foo> foos = provider.getFoos();
to
List<Foo> foos = new LinkedList<>(provider.getFoos());
Then it doesn't matter what kind of list the provider returns. The downside is that an ArrayList is still created, so there is a tradeoff between efficiency and cleanliness.

java.util package - classes vs interfaces

Why is Queue an interface, but others like Stack and ArrayList are classes?
I understand that interfaces are made so that clients can implement them and add on their own methods, whereas with classes if every client needs their methods in there it will become huge and bloated..
...or am I missing something here?

A Queue can be implemented in a number of fashions, as can a List or a Set. They all merely specify a contract for different kinds of collections.
An ArrayList, however, is a particular implementation of a List, made to internally use an array for storing elements. LinkedList is also an implementation of a List, which uses a series of interconnected nodes, i.e. a doubly linked list. Similarly, TreeSet and HashMap are particular implementations of sets and maps, respectively.
Now, Stack is a odd case here, particularly because it is a legacy class from older versions of Java. You really shouldn't use a Stack anymore; instead, you should use its modern equivalent, the ArrayDeque. ArrayDeque is an implementation of a Deque (a double-ended queue), that internally uses an array for storage (which is just what Stack does). A Deque supports all of the operations of a Stack, like pop, push, etc. Other implementations of Deque include LinkedList, as mentioned by someone else, although this deviates from Stack in that underlying it is not an array, but a doubly-linked list :-p
Now, there are plenty of implementations of Queue, and many different types of Queues. You not only have BlockingQueues (often used for producer-consumer), whose common implementations include LinkedBlockingQueue and ArrayBlockingQueue, but also TransferQueues, and so on. I digress... you can read more on the collections API in the relevant Java Tutorial.

You get the idea of interfaces correctly. In this case Java standard library already provides both implementations and interfaces.
You are better of using an interface so you can switch the implementation any time.
Hope it makes sense.

I think Stack is well-renowned for being a class that should be an interface. The Java libraries are a bit hit-and-miss when it comes to correctly choosing to provide an interface.
ArrayList is just an implementation of the List interface, so Sun got it correct there! Another classic miss (in my opinion) is the Observable class, which very much needs to be the default implementation of an interface, rather than just a class.

Interesting question. My thought about this that Queue is a basis for a lot of data structures like BlockingQueue, PriorityQueue, Deque, etc. This bunch of classes need specific implementation for various operations so it's much simpler to made Queue as interface.

The reason interfaces are used for List and Queue is NOT to reduce excessive code.
The main advantage of interfaces is they allow you to write flexible, loosely coupled code.
(Here's an awesome answer that describes this concept perfectly)
An interface simply defines a list of methods that will be implemented by a class.
This allows us to do a wonderfully powerful thing:
We can treat all classes that implement an interface the same.
This is a HUGE advantage.
Here's a very simple example:
We want to write a debug method that prints every element in a Collection.
Collection is an interface. It defines a list of operations and does not implement them.
You cannot instantiate a Collection. You can instantiate a class that implements Collection.
There are many classes that implement Collection: ArrayList, Vector, TreeSet, LinkedList, etc... They all have different snazzy features, but they also have certain things in common: Because each class implements Collection, they all implement each method found here.
This allows us to do a very powerful thing:
We can write a method that operates on ANY class that implements Collection.
It would look just like this:
public void printCollection(Collection semeCollection) {
for (Object o : someCollection) {
String s = (o == null) ? "null" : o.toString();
System.out.println(s);
}
}
Because of the magic of interfaces, we can now do the following:
public void testStuff() {
Collection s = new TreeSet();
Collection a = new ArrayList();
Collection v = new Vector();
s.add("I am a set");
a.add("I am an array list");
v.add("I am a vector");
printCollection(s);
printCollection(a);
printCollection(v);
}

List versus ArrayList as reference type?

Ok so I know that Set, List and Map are interfaces but what makes the first line of code any better than the second line?
List myArr = new ArrayList();
ArrayList myArr = new ArrayList();

If you use the first form, you are saying all you are ever going to use is the functionality of the List interface - nothing else, especially nothing extra added by any implementation of it. This means you can easily change the implementation used (e.g. just substitute LinkedList for ArrayList in the instantiation), and not worry about it breaking the rest of the code because you might have used something specific to ArrayList.

A useful general principle about types in programming (sometime referred to as the robustness principle) is as follows:
Be liberal about what you accept
Be conservative about what you emit
List is more liberal than ArrayList, since List can be any kind of List implementation e.g. an ArrayList, a LinkedList or FrancosSpecialList. Hence it is a good idea to be liberal and accept any kind of list since you may want to change the implementation later.
The main reason to use ArrayList explicitly as a type (your second case) is if you need to use methods that are specific to ArrayList that are not available through the List interface. In this case a generic List won't work (unless you want to do lots of ugly and confusing casting), so you might as well be explicit and use an ArrayList directly. This has the added bonus of hinting to a reader that specific features of ArrayList are needed.

As you can see from the source of ArrayList here, most of the methods implemented are annotated as #override because all of them that are defined through List interface so, if you are gonna use just basic functionalities (that is what you are gonna do most of the time) the difference won't be any practical one.
The difference will come if someday you will think that the features of the ArrayList are not suitable anymore for your kind of problem and you will need something different (a LinkedList for example). If you declared everything as List but instantiated as ArrayList you will easily switch to new implementation by changing the instantiations to new LinkedList() while in other case you will have to change also all variable declarations.
Using List list = new ArrayList() is more OOP style since you declare that you don't care about the specific implementation of the list, and that you want to discard the static information about the type since you will rely on the interface provided by this kind of collection abstracting from its implementation.

Java: ArrayList for List, HashMap for Map, and HashSet for Set?

I usually always find it sufficient to use the concrete classes for the interfaces listed in the title. Usually when I use other types (such as LinkedList or TreeSet), the reason is for functionality and not performance - for example, a LinkedList for a queue.
I do sometimes construct ArrayList with an initial capcacity more than the default of 10 and a HashMap with more than the default buckets of 16, but I usually (especially for business CRUD) never see myself thinking "hmmm...should I use a LinkedList instead ArrayList if I am just going to insert and iterate through the whole List?"
I am just wondering what everyone else here uses (and why) and what type of applications they develop.

Those are definitely my default, although often a LinkedList would in fact be the better choice for lists, as the vast majority of lists seem to just iterate in order, or get converted to an array via Arrays.asList anyway.
But in terms of keeping consistent maintainable code, it makes sense to standardize on those and use alternatives for a reason, that way when someone reads the code and sees an alternative, they immediately start thinking that the code is doing something special.
I always type the parameters and variables as Collection, Map and List unless I have a special reason to refer to the sub type, that way switching is one line of code when you need it.
I could see explicitly requiring an ArrayList sometimes if you need the random access, but in practice that really doesn't happen.

For some kind of lists (e.g. listeners) it makes sense to use a CopyOnWriteArrayList instead of a normal ArrayList. For almost everything else the basic implementations you mentioned are sufficient.

Yep, I use those as defaults. I generally have a rule that on public class methods, I always return the interface type (ie. Map, Set, List, etc.), since other classes (usually) don't need to know what the specific concrete class is. Inside class methods, I'll use the concrete type only if I need access to any extra methods it may have (or if it makes understanding the code easier), otherwise the interface is used.
It's good to be pretty flexible with any rules you do use, though, as a dependancy on concrete class visibility is something that can change over time (especially as your code gets more complex).

Indeed, always use base interfaces Collection, List, Map instead their implementations. To make thinkgs even more flexible you could hide your implementations behind static factory methods, which allow you to switch to a different implementation in case you find something better(I doubt there will be big changes in this field, but you never know). Another benefit is that the syntax is shorter thanks to generics.
Map<String, LongObjectClasName> map = CollectionUtils.newMap();
instead of
Map<String, LongObjectClasName> map = new HashMap<String, LongObjectClasName>();
public class CollectionUtils {
.....
public <T> List<T> newList() {
return new ArrayList<T>();
}
public <T> List<T> newList(int initialCapacity) {
return new ArrayList<T>(initialCapacity);
}
public <T> List<T> newSynchronizedList() {
return new Vector<T>();
}
public <T> List<T> newConcurrentList() {
return new CopyOnWriteArrayList<T>();
}
public <T> List<T> newSynchronizedList(int initialCapacity) {
return new Vector<T>(initialCapacity);
}
...
}

Having just come out of a class about data structure performance, I'll usually look at the kind of algorithm I'm developing or the purpose of the structure before I choose an implementation.
For example, if I'm building a list that has a lot of random accesses into it, I'll use an ArrayList because its random access performance is good, but if I'm inserting things into the list a lot, I might choose a LinkedList instead. (I know modern implementations remove a lot of performance barriers, but this was the first example that came to mind.)
You might want to look at some of the Wikipedia pages for data structures (especially those dealing with sorting algorithms, where performance is especially important) for more information about performance, and the article about Big O notation for a general discussion of measuring the performance of various functions on data structures.

I don't really have a "default", though I suppose I use the implementations listed in the question more often than not. I think about what would be appropriate for whatever particular problem I'm working on, and use it. I don't just blindly default to using ArrayList, I put in 30 seconds of thought along the lines of "well, I'm going to be doing a lot of iterating and removing elements in the middle of this list so I should use a LinkedList".
And I almost always use the interface type for my reference, rather than the implementation. Remember that List is not the only interface that LinkedList implements. I see this a lot:
LinkedList<Item> queue = new LinkedList<Item>();
when what the programmer meant was:
Queue<Item> queue = new LinkedList<Item>();
I also use the Iterable interface a fair amount.

If you are using LinkedList for a queue, you might consider using the Deque interface and ArrayDeque implementing class (introduced in Java 6) instead. To quote the Javadoc for ArrayDeque:
This class is likely to be faster than
Stack when used as a stack, and faster
than LinkedList when used as a queue.

I tend to use one of *Queue classes for queues. However LinkedList is a good choice if you don't need thread safety.

Using the interface type (List, Map) instead of the implementation type (ArrayList, HashMap) is irrelevant within methods - it's mainly important in public APIs, i.e. method signatures (and "public" doesn't necessarily mean "intended to be published outside your team).
When a method takes an ArrayList as a parameter, and you have something else, you're screwed and have to copy your data pointlessly. If the parameter type is List, callers are much more flexible and can, e.g. use Collections.EMPTY_LIST or Collections.singletonList().

I too typically use ArrayList, but I will use TreeSet or HashSet depending on the circumstances. When writing tests, however, Arrays.asList and Collections.singletonList are also frequently used. I've mostly been writing thread-local code, but I could also see using the various concurrent classes as well.
Also, there were times I used ArrayList when what I really wanted was a LinkedHashSet (before it was available).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.