Preferring mutable or immutable collections as method parameters

Preferring mutable or immutable collections as method parameters - java

I'm writing a library method that will be used in several places. One of the method's parameters is a collection of objects, and the method does not mutate this collection. Should the method signature specify a mutable or immutable collection?
Option 1: mutable collection as parameter
public static void foo(List<Bar> list) {
// ...
}
Pros: Clients can pass in whichever of List<Bar> or ImmutableList<Bar> is more convenient for them.
Cons: It is not immediately obvious that the list parameter will not be mutated. Clients must read documentation and/or code to realize this. Clients may make unnecessary defensive copies anyway.
Option 2: immutable collection as parameter
public static void foo(ImmutableList<Bar> list) {
// ...
}
Pros: Clients have a guarantee that the list parameter will not be mutated.
Cons: If the client has a List<Bar>, they must first convert it to an ImmutableList<Bar> before calling foo. This conversion wastes a small amount of time, and it is forced on clients whether they like it or not.
Note: For the purposes of this question, let's assume that all clients will have Guava's ImmutableList available, for example because the library and client code all belong to the same codebase that already uses ImmutableList elsewhere.

As the creator of the foo() API, this is none of your business. You take a list and you don't modify it, and that's it. Your code, in fact, doesn't care about the list's mutability (it's a concern of the caller): so document your intent and stop there.
If the caller needs to guarantee that the list will not be tampered with, they would create defensive copies not because you don't promise to leave the list unchanged, but because they need that guarantee.
It's following the same logic that we perform null checks in method implementations: it's needed because our code needs to be robust, not because the caller can send a null argument.
In other words, document your method as you intend to implement them, and leave it up to the caller to pick the list implementation. The reasons of their choices will vary (i.e., it won't always be only whether you'll modify the list).

Leave it at List.
Here are some situations to consider:
Collections.synchronizedCollection does not modify the client's collection.
If it forced clients to input an immutable collection, there'd be no use in making it a synchronized collection, since an immutable collection would already be thread safe.
Collections.frequency does not modify the client's collection.
To check frequency, users would be forced to endure the excess overhead of transferring elements to a new collection.
These reasons aren't why the JDK doesn't expose an immutable interface. Those reasons are explained in the documentation. In the case of synchronizedCollection, although it doesn't modify the client's collection, it does return a modifiable view of the client's collection; some would say this function wouldn't apply here. However, frequency and other query functions still hold strong.
You shouldn't restrict clients for the purpose of trying to advertise safety and nothing more. There should be more justification, otherwise your attempt to help could be a burden on the client. In some cases, your goal can be contradictive to what the function/system is achieving, such as with the synchronizedCollection example.
It's good to encourage safety, but having your system force it onto your clients without the system benefiting from it would be abuse of power; that's not your decision to make.
ernest_k makes a really good point in his answer, suggesting that you need to analyze what your system should be in charge of, and what the client should be in charge of. In this case, it should be up to the client whether the collection is immutable or not, since your system doesn't care about mutability. As he put it, "it's none of your business", which I agree with.

Related

Why does List.of() in Java not return a typed immutable list?

The list returned by the method List.of(E... elements) in java does return an immutable list, but this is not visible at all by looking at the created list. The created list simply throws an Exception instead of not showing the possiblity to change the list at all.
My point is, that List.of(E... elements) should return a ImmutableList that extends List. This way the user can decide if he cares to show this fact of immutability or not.
But I don't find anybody complaining or showing alternative solutions. Even Guava and Apache Commons don't do this by default. Only Guava gives the possibilty to create it (albeit with a lot of code):
List<String> list = new ArrayList<String>(Arrays.asList("one", "two", "three"));
ImmutableList<String> unmodifiableList = ImmutableList.<String>builder().addAll(list).build();
But even this class has a (deprecated) add and remove method.
Can anyone tell me why nobody cares about this (seemingly fundamental) issue?

It's not that nobody cares; it's that this is a problem of considerable subtlety.
The original reason there isn't a family of "immutable" collection interfaces is because of a concern about interface proliferation. There could potentially be interfaces not only for immutability, but synchronized and runtime type-checked collections, and also collections that can have elements set but not added or removed (e.g., Arrays.asList) or collections from which elements can be removed but not added (e.g., Map.keySet).
But it could also be argued that immutability is so important that it should be special-cased, and that there be support in the type hierarchy for it even if there isn't support for all those other characteristics. Fair enough.
The initial suggestion is to have an ImmutableList interface extend List, as
ImmutableList <: List <: Collection
(Where <: means "is a subtype of".)
This can certainly be done, but then ImmutableList would inherit all of the methods from List, including all the mutator methods. Something would have to be done with them; a sub-interface can't "disinherit" methods from a super-interface. The best that could be done is to specify that these methods throw an exception, provide default implementations that do so, and perhaps mark the methods as deprecated so that programmers get a warning at compile time.
This works, but it doesn't help much. An implementation of such an interface cannot be guaranteed to be immutable at all. A malicious or buggy implementation could override the mutator methods, or it could simply add more methods that mutate the state. Any programs that used ImmutableList couldn't make any assumptions that the list was, in fact, immutable.
A variation on this is to make ImmutableList be a class instead of an interface, to define its mutator methods to throw exceptions, to make them final, and to provide no public constructors, in order to restrict implementations. In fact, this is exactly what Guava's ImmutableList has done. If you trust the Guava developers (I think they're pretty reputable) then if you have a Guava ImmutableList instance, you're assured that it is in fact immutable. For example, you could store it in a field with the knowledge that it won't change out from under you unexpectedly. But this also means that you can't add another ImmutableList implementation, at least not without modifying Guava.
A problem that isn't solved by this approach is the "scrubbing" of immutability by upcasting. A lot of existing APIs define methods with parameters of type Collection or Iterable. If you were to pass an ImmutableList to such a method, it would lose the type information indicating that the list is immutable. To benefit from this, you'd have to add immutable-flavored overloads everywhere. Or, you could add instanceof checks everywhere. Both are pretty messy.
(Note that the JDK's List.copyOf sidesteps this problem. Even though there are no immutable types, it checks the implementation before making a copy, and avoids making copies unnecessarily. Thus, callers can use List.copyOf to make defensive copies with impunity.)
As an alternative, one might argue that we don't want ImmutableList to be a sub-interface of List, we want it to be a super-interface:
List <: ImmutableList
This way, instead of ImmutableList having to specify that all those mutator methods throw exceptions, they wouldn't be present in the interface at all. This is nice, except that this model is completely wrong. Since ArrayList is a List, that means ArrayList is also an ImmutableList, which is clearly nonsensical. The problem is that "immutable" implies a restriction on subtypes, which can't be done in an inheritance hierarchy. Instead, it would need to be renamed to allow capabilities to be added as one goes down the hierarchy, for example,
List <: ReadableList
which is more accurate. However, ReadableList is altogether a different thing from an ImmutableList.
Finally, there are a bunch of semantic issues that we haven't considered. One concerns immutability vs. unmodifiability. Java has APIs that support unmodifiability, for example:
List<String> alist = new ArrayList<>(...);
??? ulist = Collections.unmodifiableList(alist);
What should the type of ulist be? It's not immutable, since it will change if somebody changes the backing list alist. Now consider:
???<String[]> arlist = List.of(new String[] { ... }, new String[] { ... });
What should the type be? It's certainly not immutable, as it contains arrays, and arrays are always mutable. Thus it's not at all clear that it would be reasonable to say that List.of returns something immutable.

Removing add, remove, etc. from all the Collection types and creating subinterfaces MutableCollection, MutableList, MutableSet would double the number of Collection interfaces, which is a complexity cost to be considered. Furthermore, Collections aren't cleanly separated into Mutable and Immutable: Arrays.asList supports set, but not add.
Ultimately there's a tradeoff to be made about how much to capture in the type system and how much to enforce at runtime. Reasonable people can disagree as to where to draw the line.

I would say that since commonly collections tend to (or at least should) be treated as "immutable by default" (meaning you're rarely modifying collections that you didn't create), it's not very important to specify that "this is immutable". It would be more useful to specify "you can safely modify this collection if you wish".
Secondly, your suggested approach wouldn't work. You can't extend List and hide methods, so the only option would be to make it return an ImmutableList that's not a subtype of List. That would make it useless, as it would require a new ImmutableList interface, and any existing code wouldn't be able to use it.
So is this optimal design? No, not really, but for backwards compatibility that's not going to change.

Avoiding semantic coupling with Java Collection interfaces

These days I am reading the Code Complete book and I've passed the part about coupling-levels (simple-data-parameter, simple-object, object-parameter and semantic coupling). Semantic being the "worst" kind:
The most insidious kind of coupling occurs when one module
makes use not of some syntactic element ofanother module but of some semantic
knowledge of another module’s inner workings.
The examples in the book usually lead to run-time failures, and are typical bad code, but today I had a situation that I'm really not sure how to treat.
First I had a class, let's call it Provider fetching some data and returning a List of Foo's.
class Provider
{
public List<Foo> getFoos()
{
ArrayList<Foo> foos = createFoos();//some processing here where I create the objects
return foos;
}
}
A consuming class executes an algorithm, processing the Foo's, merging or removing from the List based on some attributes, not really important. The algorithm does all of it's processing with the head of the list. So there is a lot of operations reading/removing/adding to the head.
(I just realized I could have made the algorithm looking like merge sort, recursively calling it on halves of an array, but that doesn't answer my question :) )
I noticed I'm returning an ArrayList so I changed the providing class' getFoos method to return an LinkedList. The ArrayList has O(n) complexity with head removals, while LinkedList has constant complexity. But it then struck me that I am possibly making a semantic dependency. The code will certainly work with both implementations of List, there are no side-effects, but the performance will also be degraded. And I wrote both classes so I can easily understand, but what if a colleague had to do implement the algorithm, or if he uses the Provider as a source for another algorithm which favors random access. If he doesn't bother with the internals, like he should not, I would mess up his algorithm performance.
Declaring the method to return LinkedList is also possible, but what about the "program to interface" principle?
Is there any way to handle this situation, or my design has flaws in the roots?

The general problem is, how does a producer return something in the form that the consumer prefers? Usually the consumer needs to include the preference in the request. For example
as a flag - getFoos(randomOrLinked)
different methods - getFoosAsArrayList(), getFoosAsLinkedList()
pass a function that creates desired List - getFoos(ArrayList::new)
or pass a desired output List - getFoos(new ArrayList())
But, the producer may have the right to say, this is too complicated for me, I don't care. I'll return a form that's suitable for most use cases, and the consumer needs to handle it properly. If I think ArrayList is best, I'll just do it. (Actually you may have a better choice - a ring structure - that suits both of the two use cases in consideration)
Of course, it should be well documented. Or you could be honest and return ArrayList as the method signature, as long as you commit to it. Don't worry too much about "interface" - ArrayList is an interface (in the general sense), Iterable is an interface, so what's so special about the List interface that's between the two.
There can be another criticism on your design - you return a mutable data structure so that the consumer can directly modify. That is less desirable than return a read-only data structure. If you could, you should return a read-only view of the underlying data; the construction of the view should be inexpensive. The consumer needs to do its own copy if it needs a mutable one.

You need to make a compromise somewhere. In this case, there are two compromises that make sense to me:
If you don't know that all consumers of your Provider will be performing operations that are appropriate to a LinkedList, then stick with the signature that has return type List with implementation that returns ArrayList (ArrayList is a good all-around List implementation). Then within the calling method, wrap the returned List in a LinkedList: LinkedList<Foo> fooList = new LinkedList<>(provider.getFoos()); Make the caller responsible for its own optimizations.
If you know that all consumers of your Provider are going to use it in a LinkedList-appropriate way, then just change the return type to LinkedList -- or add another method that returns a LinkedList.
I strongly prefer the former, but both make sense.

You could have the caller create the LinkedList, instead of the provider - i.e. change
List<Foo> foos = provider.getFoos();
to
List<Foo> foos = new LinkedList<>(provider.getFoos());
Then it doesn't matter what kind of list the provider returns. The downside is that an ArrayList is still created, so there is a tradeoff between efficiency and cleanliness.

Returning a private collection using a getter method in Java

I have a number of Java classes that use private sets or lists internally. I want to be able to return these sets/lists using a get...List() method.
The alternatives I am considering:
return a reference to the internal object
construct a new set/list and fill it up (this seems bad practice?)
use Collections.unmodifiableList(partitions);
Which of these is the most common / best way to solve this issue?

There are many aspects to consider here. As others already have pointed out, the final decision depends on what your intention is, but some general statements regarding the three options:
1. return a reference to the internal object
This may impose problems. You can hardly ever guarantee a consistent state when you are doing this. The caller might obtain the list, and then do nasty things
List<Element> list = object.getList();
list.clear();
list.add(null);
...
Maybe not with a malicious intention but accidentally, because he assumed that it was safe/allowed to do this.
2. construct a new set/list and fill it up (this seems bad practice?)
This is not a "bad practice" in general. In any case, it's by far the safest solution in terms of API design. The only caveat here may be that there might be a performance penalty, depending on several factors. E.g. how many elements are contained in the list, and how the returned list is used. Some (questionable?) patterns like this one
for (int i=0; i<object.getList().size(); i++)
{
Element element = object.getList().get(i);
...
}
might become prohibitively expensive (although one could argue whether in this particular case, it was the fault of the user who implemented it like that, the general issue remains valid)
3. use Collections.unmodifiableList(partitions);
This is what I personally use rather often. It's safe in the sense of API design, and involves only a negligible overhead compared to copying the list. However, it's important for the caller to know whether this list may change after he obtained a reference to it.
This leads to...
The most important recommendation:
Document what the method is doing! Don't write a comment like this
/**
* Returns the list of elements.
*
* #return The list of elements.
*/
public List<Element> getList() { ... }
Instead, specify what you can make sure about the list. For example
/**
* Returns a copy of the list of elements...
*/
or
/**
* Returns an unmodifiable view on the list of elements...
*/
Personally, I'm always torn between the two options that one has for this sort of documentation:
Make clear what the method is doing and how it may be used
Don't expose or overspecify implementation details
So for example, I'm frequently writing documentations like this one:
/**
* Returns an unmodifiable view on the list of elements.
* Changes in this object will be visible in the returned list.
*/
The second sentence is a clear and binding statement about the behavior. It's important for the caller to know that. For a concurrent application (and most applications are concurrent in one way or the other), this means that the caller has to assume that the list may change concurrently after he obtained the reference, which may lead to a ConcurrentModificationException when the change happens while he is iterating over the list.
However, such detailed specifications limit the possibilities for changing the implementation afterwards. If you later decide to return a copy of the internal list, then the behavior will change in an incompatible way.
So sometimes I also explicitly specify that the behavior is not specified:
/**
* Returns an unmodifiable list of elements. It is unspecified whether
* changes in this object will be visible in the returned list. If you
* want to be informed about changes, you may attach a listener to this
* object using this-and-that method...
*/
These questions are mainly imporant when you intent do create a public API. Once you have implemented it in one way or another, people will rely on the behavior in one or the other way.
So coming back to the first point: It always depends on what you want to achieve.

Your decision should be based on one thing (primarily)
Allow other methods to modify the original collection ?
Yes : return a reference of the internal object.
No :
construct a new set/list and fill it up (this seems bad practice? -- No. Not at all. This is called Defensive programming and is widely used).
use Collections.unmodifiableList(partitions);

return a reference to the internal object
In this case receiver end can able to modify the object's set or list which might not be requirement. If you allow users to modify state of object then it is simplest approach.
construct a new set/list and fill it up (this seems bad practice?)
This is example shallow copy where collection object will not be modifiable but object would be used same. So any change in object state will effect the actual collection.
use Collections.unmodifiableList(partitions);
In this case it returns an unmodifiable view of the specified list. This method allows modules to provide users with "read-only" access to internal lists. This could be used as best practice in situation where you want to keep object's state safe.

I believe the best solution is to return an unmodifiable list. If compared to the construction of a new list, returning an unmodifiable "proxy" of the original list may save the client from implicitly generating a lot of unnecessary lists. On the other hand, if the client really needs to have a modifiable list, let it create a new list by itself.
The problem you still have to consider is that the objects contained into the list may be modified. There is no cheap and easy const-correctness in Java.

The second option is definitely the right way to go.
The other two options depend on your requirements.
If you are not going to modify the list values outside the class, return an unmodifiable list.
otherwise, just return the reference.

Taking an ImmutableCollection as parameter versus creating a local copy

When using Guava's a ImmutableCollection as a parameter for a function is it better to require an ImmutableCollection as parameter type:
void <T> foo(ImmutableCollection<T> l)
or should the function take a Collection<T> and create an immutable collection itself as in
void <T> foo(Collection<T> l)
{
ImmutableCollection<T> l2 = ImmutableCollection.copyOf(l);
// ...
}
The first version seems preferable because the caller is sure that the map he passes to the function is not modified by it. But the first version requires client code with a collection to call copyOf(), that is:
Collection collection = map.values();
foo(ImmutableCollection.copyOf(collection));
// instead of simply
foo(collection);
PS: This is not completely true, since ImmutableCollection does not have copyOf() but ImmutableList and ImmutableSet do.

I think that it depends on what the foo function is supposed to do with the collection argument.
If foo is going to read the collection elements, then void <T> foo(Collection<T> l) is preferable, because it leaves the decision to the caller.
If foo is going to incorporate the collection into the state of some object, then an immutable collection may be preferable. However, we then need to ask ourselves whether it should be the foo method's responsibility to deal with this, or the caller's responsibility.
There isn't a single right (or "best practice") answer to this. However, using ImmutableCollection as the parameter's formal type could result in complexity and/or unnecessary copying in some cases.

Look at the guava docs: "copyOf is smarter than you think."
So you can use the generic Collection interface with no regrets for performance.
Whether the copy is necessary (rather than a function comment) depends, in my view, on how long you're holding on to the data.

Use the more generic Collection interface; it's much better for you to write the call once than to require all of the clients to do so on every call. If you're really concerned about performance, and profiling shows it to be an issue, you could do a class check on the incoming collection to see whether you can avoid the copy.

It depends on what foo is doing. Most probably it simply reads the collection values in which case it does not need to make a copy of it, especially immutable one.

The one advantage of using ImmutableCollection is that the method guarantees that it will not modify the collection. But then that guarantee is to the user only, the platform does not understand it, so you might as well express it in comments or a custom annotation.

Java: Does Collections.unmodifiableXYZ(...) in special cases make a collection object a flyweight?

The possible answers are either "never" or "it depends".
Personally, I would say, it depends.
Following usage would make a collection appear (to me) to be a flyweight:
public final static List<Integer> SOME_LIST =
Collections.unmodifiableList(
new LinkedList<Integer>(){ // scope begins
{
add(1);
add(2);
add(3);
}
} // scope ends
);
Right? You can't ever change it, because the only place where the
"original" collection object is known (which could be changed), is the
scope inside unmodifiableList's parameter list, which ends immediately.
Second thing is: when you retrieve an element from the list, it's an
Integer which itself is a flyweight.
Other obvious cases where final static and unmodifiableList are
not used, would not be considered as flyweights.
Did I miss something?
Do I have to consider some internal aspects of LinkedList which could
compromise the flyweight?

i think you are referring to the flyweight pattern. the fundamental idea of this pattern is that you are dealing with complex objects whose instances can be reused, and put out different representations with its methods.
to make such a object work correctly it should be immutable.
immutability is clearly given when creating a List the way you described.
but since there is no external object/parameters on which the SOME_LISt operates on i would not call this an example of a flyweight pattern.
another typical property of the flyweight pattern is the "interning" of such objects. when creating just a single instance of an object this does not make sense.
if you are dealing a lot with lists that are passed around from one object to another and you want to ensure the Immutability, a better option might be to use Google-Collections.
final static ImmutableList<Integer> someList = ImmutableList.of(1, 2, 3);
of course it is also possible to construct more complex Immutable Objects with Builders.
this creates an instance of an immutable list. it will still implement the List interface, but will refuse to execute any add(),addAll() set(), remove() operation.
so you can still pass it to methods when a List interface is required, yet be sure that its content is not altered.

I think your example are for immutable objects, a flyweight is something quite different. Immutable objects are candidates for flyweight, but a flyweight doesn't have to be immutable, it just has to be designed to save memory.

Having the library detect that the mutable List has not otherwise escaped is a bit of an ask, although theoretically possible.
If you serialise the returned object, then trusted code could view the internal object. Although the serialised form of the class are documented, it's not documented that the method uses those classes.
In practical terms, any cache is down to the user of the API.
(Why LinkedList for an immutable list, btw? Other than it changes the unmodifiable implementation.)

Integer is only a flyweight from -128 to 127.
See also http://www.javaworld.com/javaworld/jw-07-2003/jw-0725-designpatterns.html.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.